created 17/06/09            last update 17/08/09   author: Claude Baumann
If 10th grade students are asked about their appreciation of trigonometry, they generally express themselves in terms of dislike and incomprehension. Subjectively, trigonometry is qualified as “difficult”, even for mathematically gifted students. While higher grades show certain ability with terminology, symbols and formulas, having acquired some faculty in rearranging and solving equations and problems, they rarely change their initial appreciation. Mathematics teachers deplore how superficially trigonometry knowledge may be anchored, making new learning based on presupposed skills almost impossible without intense, time-costly and frustrating revisions. Future study of complex numbers, periodic functions, calculus and transformations may therefore be seriously compromised. This, of course, sharply contrasts with some of the optimistic expectations of standards for school mathematics as expressed by the US National Council of Teachers of Mathematics for instance. Nonetheless neither the students, nor the teachers are to blame for the simple reason of trigonometry’s intrinsic character of formalism and the obvious need of repetitive exercising.

This page wants to arouse student interest for trigonometry and starts from the practical problem that embedded programmers encounter, when they attempt to implement trigonometric functions into a microcontroller-system that hasn't any built-in higher mathematical kernel and that suffers from limited resources. In the following we will try to answer the questions:

  • How would you tell a computer to calculate those trig functions?
  • How do you think HP solved the limited resource problem in the world's first scientific calculator HP35 that appeared in 1972?

Before exploring more deeply the famous CORDIC algorithm -that was the historic reply to the second question- we will present an easy-to-implement algorithm that is directly derived from elementary trig formulas. Some useful preliminary information can be found on the excellent wiki page.

 

1. Simple algorithm for the computation of sin, cos

Let us suppose that the first quadrant angle x is expressed in radians. The relevant identities that students meet rather early are: 

The first equations are very interesting for computers, because they may be recursively used with the division by 2 that normally is equivalent to simple binary right-shifting. Also, from the square-root page we know that there is an easy way to manually or digitally compute the square-root of a number. So, if we start from known values like , then we should be able to fill a look-up-table (LUT) that may serve as the base for our algorithm:

i

 

 

 

1

2

0.0000000

1.0000000

2

4

0.7071068

0.7071068

3

8

0.9238796

0.3826834

4

16

0.9807853

0.1950903

5

32

0.9951847

0.0980171

6

64

0.9987955

0.0490677

7

128

0.9996988

0.0245412

8

256

0.9999247

0.0122715

9

512

0.9999812

0.0061359

10

1024

0.9999953

0.0030680

11

2048

0.9999988

0.0015340

12

4096

0.9999997

0.0007670

13

8192

0.9999999

0.0003835

14

16384

1.0000000

0.0001917

...

     
Using this table and the second set of formulae, we are now able to compute any other value. The main idea of the algorithm is to approximate the given angle x by successively summing up the values as long as the sum ai+1=ai+bi is smaller than x, then re-iterating with the next value i. We will expose the method through the practical calculation of cos(1) and sin(1):
  • Start with
  • We see that , so we don't add to and
  • We have , so consider  and the first approximations for the sine and the cosine are twice and  add
  • We get and we skip the summation and
  • Now the test returns: and we may compute the next approximations (as we see, we just pick the new sine and cosine from the table and the old from the previous iteration):

    and

  • We continue like that until we have reached our desired precision and we will see that the sine and the cosine rapidly converge to the real values without ever reaching them.
  • The whole procedure may be summarized as a converging rotation of the unitary vector starting from (1,0)T as can be seen on (Fig. 1).
  • We also conclude that the sine and the cosine of the angle are computed at the same time.
  • The procedure can be summarized as a function, where the sine and cosine values are picked from the look-up table:

Fig. 1 : The procedure realizes positive rotations that rapidly converge to the argument angle.

The following LabVIEW diagram (Fig. 2) will show the algorithm more clearly (Notes: We suppose that the table has been pre-computed. The table starts with i=2, leading to some minor changes in the procedure.).

             

Fig. 2 : This LabVIEW program calculates the sine and cosine values according to the described algorithm..

 

2. Applying the same algorithm to compute the inverse functions arcsin and arccos

If the values s and c are initialized with the coordinates of the vector, of which the abscissa represents the sine of the unknown angle. The second coordinate is extracted with the fundamental trigonometric equation , then the present algorithm can be used to compute the inverse functions.. In that case the angle variable is accumulated with the next item in the angle base list, if the actual sine value is greater than the lookup-table sine value. Notice that the formulas are used with the signs exchanged. Also the rotation starts from the angle vector and ends at (1,0)T (cf. Fig. 3). If the inputs of the loop are inverted, the algorithm computes the arccos.

Fig. 3 : The clockwise rotations converges towards angle zero.

    

Fig. 4 : Changing the signs in the main equations and operating the clockwise rotation yields the angle.

Important note: These algorithms may as well be employed on the tangent and arctangent functions, either by dividing the sine by the cosine or by using a single LUT that is filled with tangent values. In that case, the following formulas can be applied:

 

3. Excellent accuracy, but bad time efficiency of the algorithm

We have chosen 21 loops in the programs above, because this assures a precision less than 1E-6. (It must be underlined that the algorithm could be continued indefinitely and provide infinite accuracy.) It is clear that the probability for a certain base angle to be part of the composition of the computed angle is 50%. It can be concluded that the average amount of operations is 21/2*(4 multiplications + 3 additions/subtractions + 1 division by two + 1 comparison), which certainly may be qualified as badly efficient. Only modern FPGAs or DSP-modules like the Microchip dsPIC33 can operate multiplications at comparable speed to additions. Other microcontrollers would be desperately loosing time in the calculation of one single trig-value. Therefore it is not astonishing that programmers prefer other approximations that do not necessarily fulfill the "infinite precision" criterion. One excellent possibility, that also is very easy to implement, is the application of the Taylors or MacLaurin series for the sine and cosine and polynomial or Chebychev approximations for the arctangent function. Although these methods are straightforward in terms of practical algorithms, their understanding is not evident due to the calculus origin.

(Fig. 5) shows the result (absolute error) of a LabVIEW simulation comparing one million different sine values that are computed in single precision with the explained algorithm and those calculated in extended precision with the original LabVIEW trig kernel function.

Fig. 5 : Excellent accuracy in single precision representation.

But, if we still have in mind the "infinite precision" criterion, we might ask, whether it is possible to significantly speed up the algorithm. The answer obviously is: yes, we can, ... if we are able to reduce the number of multiplications. If we have a look at the cosine-table above, we can see that in the case of single precision floating point data, with a precision less than 1E-7, we can stop multiplying with the cosines after the 13th iteration, because they all equal 1. This already is a tiny improvement of the algorithm. Also -as already pointed out- the division by 2 can be replaced by a digital right shift. But that's all we can do without drastically altering the algorithm. The algorithm presents another weak point, since the computing time is not constant for all the values of the domain. We run a highest priority LabVIEW program that computed 100000 different values on an elderly Intel Pentium4 2.6GHz PC under XP (cf. Fig. 6).

Fig. 6 : This algorithm doesn't present any constant computing speed.

 

3. CORDIC sin cos functions

In order to significantly improve the described algorithm, we should:

  • Avoid multiplications
  • Have nearby the same amount of operations during each iteration

That's exactly what J. Volder did, when he developed the CORDIC algorithm in 1959 (See: J. E. Volder, The CORDIC Triginometric Computing Technique, IRE Transactions on Electronic Computers, volume C-8, No. 3, September 1959, pp 330-334). We now will expose this extraordinary method through the transformation of our inefficient algorithm.

First let us consider the trigonometric addition formulas a bit differently:

There is no visible gain, because the number of operations doesn't change. But, if we considered angles b that would fulfill the following condition , then the second multiplication in each equation could be replaced with a simple digital bit-shifting method. However the new formulas would need another table:

(EQ. A)

i

 

0

1

0.78539816 [rad] = 45°

1

2

0.46364761 [rad] = 26.56°

2

4

0.24497866 [rad] = 14.035

3

8

0.12435499

4

16

0.06241881

5

32

0.03123983

6

64

0.01562373

7

128

0.00781234

8

256

0.00390623

9

512

0.00195312

10

1024

0.00097656

11

2048

0.00048828

12

4096

0.00024414

13

8192

0.00012207

14

16384

6.1035E-05

15

32768

3.0518E-05

16

65536

1.5259E-05

 

 

We discover a few particularities:
  • In the first algorithm, the infinite sum of all possible base-angles tends to Pi/2. Now, we get an angle that is greater than Pi/2. (We must put attention to this, because we may pass into the second quadrant):

(Note: the second property is easy to verify, but rather complicated to prove.)

  • The constant cosine values may be written differently:

Now let us initialize the unitary vector (please don't confuse the coordinate x with the trig function argument that we were using above.):

We may reconsider the equations (EQ. A) that actually describe a determined rotation of this vector.

As you can see, since the factor is missing in the equations, compared to (EQ. A), the new coordinates of the vector are larger by the inverse of this factor than the new sine or cosine of (EQ. A), and the magnitude of vector has grown by factor . Because this operation is not isometric, we must call it a pseudo-rotation rather than an ordinary rotation. If we repeat the procedure for all significant values of i, then the vector will grow each time by the related factor. So, the factors accumulate as an infinite product. In fact, we can easily prove that this product has a limit:

This number is called the CORDIC gain. (Note that this important constant is slightly smaller than the square-root of Euler's number e.)

Imagine that our initial angle is the maximum 99.88°. This angle can only be reached by rotation, if every base value is positively present. (Fig. 7) shows how the vector changes through the repeated pseudo-rotations.

Fig. 7 : At each iteration the magnitude of the vector grows a small portion.

We can prove that the values form an infinite non-linear base that can represent by combination every angle of the domain [0, 99.88°], if we allow two possible factors for each base-element: [-1,1]. This is a very important property, because we can reach every angle by pseudo-rotation. We only need to check, if the phase of the pseudo-rotated vector is greater than the target angle. Therefore, we must introduce another variable s, that indicates the sense of the actual rotation. In other words, if the vector-phase is greater than the argument angle, the next rotation must change into the negative direction. If it is smaller, the next sense of rotation must be positive. The side-effect of this extraordinary feature is that each iteration runs at almost identical speed. So, we rewrite the equations and develop the procedure for the functions on the argument :

 

Fig. 8 : The angle converges by oscillation.

The sequence of the angles Ai represents a Cauchy sequence that converges by oscillation (Fig. 8). In the paper How do calculators calculate trigonometric functions, J. M. Underwood and B. H. Edwards reproduce Ch. W. Schelin's  proof of the convergence of this sequence. (See: Ch. W. Schelin, Calculator Function Approximation, American Mathematical Monthly 90 (5), 1983, pp 317-325.)

Important note: If the vector is initialized with

,

the procedure will end with the exact sine and cosine values and the final division can be omitted (cf. Fig. 9).

Fig. 8 : Starting with a correctly scaled vector yields the sine and cosine.

 

4. Hack into the IEEE single precision floating point standard

It is often insisted that the CORDIC algorithm is best implemented into hardware. In order to efficiently run the algorithm in software, we should profit from the important gain in speed that is the result of the replacement of a couple of multiplications with digital bit-shifting methods. If we want to use the floating point representation instead of fixed point data, it is essential to hack into the floating point bit code that nowadays usually obeys the IEEE 754 standard. We want to visualize with LabVIEW, how this can be done for single precision data. (The graphical code can easily be transcripted into any other language and data size.)

In the floating point representation, a division by two is very easy to realize. The mantissa of the number remains unchanged, only the exponent is decreased by 1 (Fig. 9). You'll notice that the program diagram doesn't subtract 1 but the HEX value 0x800000, equivalent to the binary b'100000000000000000000000'. This is nothing but a digital "one" shifted to the correct place, where the exponent is located in the single precision floating point bit code.

Fig. 9 : The LabVIEW flatten-to-string / unflatten-from-string functions tell the computer to consider the input number as a raw integer number.

The sign of the variable can easily be changed by setting the most significant bit (msb). Most higher languages have difficulties in giving access to very basic functions like bit setting. LabVIEW does not make any difference here and the hidden resulting object code must certainly be longer than actually needed. By contrast, in Assembler, setting a single bit would take a single line only. The IEEE standard also requires safety in operations. In this case, we must assure that zero divided by two still is zero and the division of the smallest possible number doesn't produce any invalid number -denoted NaN (Not-a-Number)-, but returns zero in such a case. Although this wrapping around looks bulky, we must underline that any valid floating point operation needs such precautions, which normally are completely invisible to the programmer. (Fig. 10) shows the diagram for an advanced fast division by powers of two. Note that this time the value "i" is shifted into the exponent place holder and then is subtracted there, necessarily producing the division by powers of two. This function also operates the multiplying with the sign variable using the binary XOR-function. (The sign is set by placing a logical "1" to the correct place in the single precision floating point bit code.)

Fig. 10 : Fast division by powers of two

Finally the sine/cosine function is represented in (Fig. 11). All the floating-point multiplications and divisions have disappeared in the code. The program needs a few more iterations than the first algorithm, because the arctangent base list converges a bit more slowly than the first look-up table. Note that the algorithm can be converted into the arctangent function with little effort. In that case, the variable vector must be the initialized with the sine and cosines that are derived from the tangent, and the decisions must be taken on the sign of the second coordinate y of the vector. Also notice that the algorithm then also yields the magnitude of the original vector scaled by the CORDIC gain.

Fig. 11 : CORDIC algorithm applied to floating point sine and cosine calculation.

 

5. Excellent time efficiency, troubles with the accuracy

(Fig. 12) shows that the error and the error variation of the CORDIC sine values directly depend on the argument angle. This issue can rapidly be overcome with an error-correction.

Fig. 12 : The CORDIC algorithm sine values compared to those computed with the regular extended precision LabVIEW function.

As can be concluded from the timing graph (Fig. 13), the computing duration, besides of being much shorter, has narrower variation, which normally is more desirable.

Fig. 13 : The timing is much more stable and has narrow variation.

 

6. Improving both the accuracy and the computing speed

Not every microcontroller has an implemented floating point math kernel. Some people even are opposed to this data representation for various reasons. They prefer working in fixed point arithmetic or just use scaled integers. While generally running faster, these approaches always suffer, when there are attempts of calculating at the edges of the domain. For example, how would you represent the arctangent of an angle very close to 90°? Or what happens to a 16bit integer variable X=10000 that is squared? Operations like that would require some precautions, where you protect the data from running into overflow, which means that either you trap and report errors like the division by zero or data overflow, abort the program, or write tons of code for exception handling. Modern microcontrollers, especially DSP-types, generate interrupts or even allow the processor to run into saturation, with the result that there is no overflow anymore. By contrast the IEEE floating point standard describes how the variables and functions should behave in extreme cases. There even exists the +/-inifinity "value". Trying to take the square-root of a negative number will generate a "NaN" message that is part of the floating point number representation.

If you can make sure that your data will always stay within the domain, then the scaled integer or fixed point math is the best approach. We now will improve the floating point CORDIC algorithm by changing to the scaled integer representation. Since the first quadrant sine and cosine values always stay within the interval [0,1], we should be able to control the data. However, the CORDIC algorithm has the particularity that small angles and angles around pi/2 rotate the vector into the second or the forth quadrant during the iterations. Anyway, the values still are bound to [-1,1] and [-1.78...rad,1.78...rad].

In the prevous LabVIEW program (Fig. 11), we were using 23 iterations that already gave us satisfying precision and speed. But, in fact, we can easily verify that if we continue to 30 iterations, the calculations will improve. (This depends on the fact, whether guard bits are being used in the floating point kernel.) In order to produce the best precision, we will scale all the floating point values with the inverse of the small number arctan(2^-30)=1.8265E-9. We get a scaled integer constant array that will serve for purely integer operations. Only at the end, we will turn back and rescale the results (Fig. 14). Compared to the previous approach, the accuracy is increased, because we tuned the rescaling constant that also corrects the drift of (Fig. 12). The new procedure runs more than ten times faster, despite 7 more iterations. The main reason is that all the hidden floaring point wrapping has disappeared in this new version.

Fig. 14 : The CORDIC algorithm is now executed with signed integers. Multiplications are replaced by bit-shifting and logical XOR function calls.

Fig. 15 : Tuning the back-scaling constant produces best precision. (Comparison as indicate above.)

Fig. 16 : The gain in speed is remarkable.

 

7. The HP35

HP applied the CORDIC algorithm in their famous slidekiller HP35 calculator. With lot's of guts D. S. Cochran succeeded in packing the total calculator code into less than 800 lines of microcode. As a particularity, the CORDIC algorithm was not applied in base two, but in base ten and the numbers were directly BCD-encoded.

 

Bibliography:

  • J. E. Volder, The CORDIC Trigonometric Computing Technique, IRE Transactions on Electronic Computers, volume C-8, No. 3, September 1959, pp 330-334
  • Ch. W. Schelin, Calculator Function Approximation, American Mathematical Monthly 90 (5), 1983, pp 317-325
  • J. S. Walther, A unified algorithm for elementary functions, AFIPS Joint Computer Conferences, Proceedings of the May 18-20, 1971, spring joint computer conference, Atlantic City, New Jersey, US, Session: Topics in computer arithmetic and artificial intelligence, pp 379-385
  • R. Andraka, A survey of CORDIC algorithms for FPGA based computers, Proceedings of the 1998 ACM/SIGDA sixth International Symposium of Field Programmable Gate Arrays, Monterey, California, US, pp 191-200

 

Relevant links: