|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| created 17/06/09 last update 17/08/09 | author: Claude Baumann | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| If
10th grade students are asked about their
appreciation of trigonometry, they generally express themselves in terms
of dislike and incomprehension. Subjectively, trigonometry is qualified as
“difficult”, even for mathematically gifted students. While higher
grades show certain ability with terminology, symbols and formulas, having
acquired some faculty in rearranging and solving equations and problems,
they rarely change their initial appreciation. Mathematics teachers
deplore how superficially trigonometry knowledge may be anchored, making
new learning based on presupposed skills almost impossible without
intense, time-costly and frustrating revisions. Future study of complex
numbers, periodic functions, calculus and transformations may therefore be
seriously compromised. This, of course, sharply contrasts with some of the
optimistic expectations of standards for school mathematics as expressed
by the US National
Council of Teachers of Mathematics for instance. Nonetheless neither
the students, nor the teachers are to blame for the simple reason of
trigonometry’s intrinsic character of formalism and the obvious need of
repetitive exercising.
This page wants to arouse student interest for trigonometry and starts from the practical problem that embedded programmers encounter, when they attempt to implement trigonometric functions into a microcontroller-system that hasn't any built-in higher mathematical kernel and that suffers from limited resources. In the following we will try to answer the questions:
Before exploring more deeply the famous CORDIC algorithm -that was the historic reply to the second question- we will present an easy-to-implement algorithm that is directly derived from elementary trig formulas. Some useful preliminary information can be found on the excellent wiki page. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1. Simple algorithm for the computation of sin, cos Let us suppose that the first quadrant angle x is expressed in radians. The relevant identities that students meet rather early are:
The first equations are very interesting
for computers, because they may be recursively used with the division by 2 that normally is equivalent
to simple binary right-shifting. Also, from the
square-root page we know that there is an easy way to manually or
digitally compute the square-root of a number. So, if we start from known
values like |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Using this table and the second set of
formulae, we are now able to compute any other value. The main idea of the
algorithm is to approximate the given angle x by
successively summing up the values
Fig. 1 : The procedure realizes positive rotations that rapidly converge to the argument angle. The following LabVIEW diagram (Fig. 2) will show the algorithm more clearly (Notes: We suppose that the table has been pre-computed. The table starts with i=2, leading to some minor changes in the procedure.).
Fig. 2 : This LabVIEW program calculates the sine and cosine values according to the described algorithm.. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
2. Applying the same algorithm to compute the inverse functions arcsin and arccos If the values s and c
are initialized with the coordinates of the vector, of which the abscissa
represents the sine of the unknown angle. The second coordinate is
extracted with the fundamental trigonometric equation
Fig. 3 : The clockwise rotations converges towards angle zero.
Fig. 4 : Changing the signs in the main equations and operating the clockwise rotation yields the angle. Important note: These algorithms may as well be employed on the tangent and arctangent functions, either by dividing the sine by the cosine or by using a single LUT that is filled with tangent values. In that case, the following formulas can be applied:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
3. Excellent accuracy, but bad time efficiency of the algorithm We have chosen 21 loops in the programs
above, because this assures a precision less than 1E-6. (It must be
underlined that the algorithm could be continued indefinitely and provide
infinite accuracy.) It is clear that the probability for a certain base
angle
(Fig. 5) shows the result (absolute error) of a LabVIEW simulation comparing one million different sine values that are computed in single precision with the explained algorithm and those calculated in extended precision with the original LabVIEW trig kernel function.
Fig. 5 : Excellent accuracy in single precision representation. But, if we still have in mind the "infinite precision" criterion, we might ask, whether it is possible to significantly speed up the algorithm. The answer obviously is: yes, we can, ... if we are able to reduce the number of multiplications. If we have a look at the cosine-table above, we can see that in the case of single precision floating point data, with a precision less than 1E-7, we can stop multiplying with the cosines after the 13th iteration, because they all equal 1. This already is a tiny improvement of the algorithm. Also -as already pointed out- the division by 2 can be replaced by a digital right shift. But that's all we can do without drastically altering the algorithm. The algorithm presents another weak point, since the computing time is not constant for all the values of the domain. We run a highest priority LabVIEW program that computed 100000 different values on an elderly Intel Pentium4 2.6GHz PC under XP (cf. Fig. 6).
Fig. 6 : This algorithm doesn't present any constant computing speed. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
3. CORDIC sin cos functions In order to significantly improve the described algorithm, we should:
That's exactly what J. Volder did, when he developed the CORDIC algorithm in 1959 (See: J. E. Volder, The CORDIC Triginometric Computing Technique, IRE Transactions on Electronic Computers, volume C-8, No. 3, September 1959, pp 330-334). We now will expose this extraordinary method through the transformation of our inefficient algorithm. First let us consider the trigonometric addition formulas a bit differently:
There is no visible gain, because the
number of operations doesn't change. But,
if we considered angles b that would fulfill the following
condition |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(EQ.
A)
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
We discover a few particularities:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Now let us initialize the unitary vector
(please don't confuse the coordinate x with the trig function argument
that we were using above.):
We may reconsider the equations (EQ. A) that actually describe a determined rotation of this vector.
As
you can see, since the factor
This number is called the CORDIC gain. (Note that this important constant is slightly smaller than the square-root of Euler's number e.) Imagine that our initial
angle is the maximum 99.88°. This angle can only be reached by rotation,
if every base value
Fig. 7 : At each iteration the magnitude of the vector grows a small portion. We
can prove that the values
Fig. 8 : The angle converges by oscillation. The sequence of the angles Ai represents a Cauchy sequence that converges by oscillation (Fig. 8). In the paper How do calculators calculate trigonometric functions, J. M. Underwood and B. H. Edwards reproduce Ch. W. Schelin's proof of the convergence of this sequence. (See: Ch. W. Schelin, Calculator Function Approximation, American Mathematical Monthly 90 (5), 1983, pp 317-325.) Important note: If the vector is initialized with
the procedure will end with the exact sine and cosine values and the final division can be omitted (cf. Fig. 9).
Fig. 8 : Starting with a correctly scaled vector yields the sine and cosine. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
4. Hack into the IEEE single precision floating point standard It is often insisted that the CORDIC algorithm is best implemented into hardware. In order to efficiently run the algorithm in software, we should profit from the important gain in speed that is the result of the replacement of a couple of multiplications with digital bit-shifting methods. If we want to use the floating point representation instead of fixed point data, it is essential to hack into the floating point bit code that nowadays usually obeys the IEEE 754 standard. We want to visualize with LabVIEW, how this can be done for single precision data. (The graphical code can easily be transcripted into any other language and data size.) In the floating point representation, a division by two is very easy to realize. The mantissa of the number remains unchanged, only the exponent is decreased by 1 (Fig. 9). You'll notice that the program diagram doesn't subtract 1 but the HEX value 0x800000, equivalent to the binary b'100000000000000000000000'. This is nothing but a digital "one" shifted to the correct place, where the exponent is located in the single precision floating point bit code.
Fig. 9 : The LabVIEW flatten-to-string / unflatten-from-string functions tell the computer to consider the input number as a raw integer number. The sign of the variable can easily be changed by setting the most significant bit (msb). Most higher languages have difficulties in giving access to very basic functions like bit setting. LabVIEW does not make any difference here and the hidden resulting object code must certainly be longer than actually needed. By contrast, in Assembler, setting a single bit would take a single line only. The IEEE standard also requires safety in operations. In this case, we must assure that zero divided by two still is zero and the division of the smallest possible number doesn't produce any invalid number -denoted NaN (Not-a-Number)-, but returns zero in such a case. Although this wrapping around looks bulky, we must underline that any valid floating point operation needs such precautions, which normally are completely invisible to the programmer. (Fig. 10) shows the diagram for an advanced fast division by powers of two. Note that this time the value "i" is shifted into the exponent place holder and then is subtracted there, necessarily producing the division by powers of two. This function also operates the multiplying with the sign variable using the binary XOR-function. (The sign is set by placing a logical "1" to the correct place in the single precision floating point bit code.)
Fig. 10 : Fast division by powers of two Finally the sine/cosine function is represented in (Fig. 11). All the floating-point multiplications and divisions have disappeared in the code. The program needs a few more iterations than the first algorithm, because the arctangent base list converges a bit more slowly than the first look-up table. Note that the algorithm can be converted into the arctangent function with little effort. In that case, the variable vector must be the initialized with the sine and cosines that are derived from the tangent, and the decisions must be taken on the sign of the second coordinate y of the vector. Also notice that the algorithm then also yields the magnitude of the original vector scaled by the CORDIC gain.
Fig. 11 : CORDIC algorithm applied to floating point sine and cosine calculation. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
5. Excellent time efficiency, troubles with the accuracy (Fig. 12) shows that the error and the error variation of the CORDIC sine values directly depend on the argument angle. This issue can rapidly be overcome with an error-correction.
Fig. 12 : The CORDIC algorithm sine values compared to those computed with the regular extended precision LabVIEW function. As can be concluded from the timing graph (Fig. 13), the computing duration, besides of being much shorter, has narrower variation, which normally is more desirable.
Fig. 13 : The timing is much more stable and has narrow variation. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
6. Improving both the accuracy and the computing speed Not every microcontroller has an implemented floating point math kernel. Some people even are opposed to this data representation for various reasons. They prefer working in fixed point arithmetic or just use scaled integers. While generally running faster, these approaches always suffer, when there are attempts of calculating at the edges of the domain. For example, how would you represent the arctangent of an angle very close to 90°? Or what happens to a 16bit integer variable X=10000 that is squared? Operations like that would require some precautions, where you protect the data from running into overflow, which means that either you trap and report errors like the division by zero or data overflow, abort the program, or write tons of code for exception handling. Modern microcontrollers, especially DSP-types, generate interrupts or even allow the processor to run into saturation, with the result that there is no overflow anymore. By contrast the IEEE floating point standard describes how the variables and functions should behave in extreme cases. There even exists the +/-inifinity "value". Trying to take the square-root of a negative number will generate a "NaN" message that is part of the floating point number representation. If you can make sure that your data will always stay within the domain, then the scaled integer or fixed point math is the best approach. We now will improve the floating point CORDIC algorithm by changing to the scaled integer representation. Since the first quadrant sine and cosine values always stay within the interval [0,1], we should be able to control the data. However, the CORDIC algorithm has the particularity that small angles and angles around pi/2 rotate the vector into the second or the forth quadrant during the iterations. Anyway, the values still are bound to [-1,1] and [-1.78...rad,1.78...rad]. In the prevous LabVIEW program (Fig. 11), we were using 23 iterations that already gave us satisfying precision and speed. But, in fact, we can easily verify that if we continue to 30 iterations, the calculations will improve. (This depends on the fact, whether guard bits are being used in the floating point kernel.) In order to produce the best precision, we will scale all the floating point values with the inverse of the small number arctan(2^-30)=1.8265E-9. We get a scaled integer constant array that will serve for purely integer operations. Only at the end, we will turn back and rescale the results (Fig. 14). Compared to the previous approach, the accuracy is increased, because we tuned the rescaling constant that also corrects the drift of (Fig. 12). The new procedure runs more than ten times faster, despite 7 more iterations. The main reason is that all the hidden floaring point wrapping has disappeared in this new version.
Fig. 14 : The CORDIC algorithm is now executed with signed integers. Multiplications are replaced by bit-shifting and logical XOR function calls.
Fig. 15 : Tuning the back-scaling constant produces best precision. (Comparison as indicate above.)
Fig. 16 : The gain in speed is remarkable. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
7. The HP35 HP applied the CORDIC algorithm in their famous slidekiller HP35 calculator. With lot's of guts D. S. Cochran succeeded in packing the total calculator code into less than 800 lines of microcode. As a particularity, the CORDIC algorithm was not applied in base two, but in base ten and the numbers were directly BCD-encoded. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bibliography:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Relevant links:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||