Data-reduction using the Least square polynomial
Sometimes you have to operate on a more or less large set of data. Manipulating such sets is generally difficult.
Suppose your result should be a straight line in a plane area, but of course your data only moves around. You guess that there is a linear tendency, but the computer doesn't. Here the example of a 25 data-set:
x | y | x2 | xy |
1 | 0,66496424 | 1 | 0,66496424 |
2 | 3,94075718 | 4 | 7,88151436 |
3 | 3,35247829 | 9 | 10,0574349 |
4 | 3,86552513 | 16 | 15,4621005 |
5 | 4,49854689 | 25 | 22,4927345 |
6 | 5,80525828 | 36 | 34,8315497 |
7 | 5,81779177 | 49 | 40,7245424 |
8 | 6,10924665 | 64 | 48,8739732 |
9 | 6,48756951 | 81 | 58,3881256 |
10 | 5,59638624 | 100 | 55,9638624 |
11 | 7,28547502 | 121 | 80,1402252 |
12 | 9,0264179 | 144 | 108,317015 |
13 | 8,02644055 | 169 | 104,343727 |
14 | 10,0489198 | 196 | 140,684877 |
15 | 9,21802309 | 225 | 138,270346 |
16 | 10,9809859 | 256 | 175,695774 |
17 | 8,63910735 | 289 | 146,864825 |
18 | 9,44954277 | 324 | 170,09177 |
19 | 12,3358053 | 361 | 234,380301 |
20 | 11,5203612 | 400 | 230,407224 |
21 | 11,4433213 | 441 | 240,309747 |
22 | 12,0350019 | 484 | 264,770042 |
23 | 13,729675 | 529 | 315,782525 |
24 | 12,7141765 | 576 | 305,140236 |
25 | 12,7032674 | 625 | 317,581685 |
325 | 205,295045 | 5525 | 3268,12112 |
Suppose you have N data points (xi, yi). The equation for the straight line is y = c1 + c2x, where c1 and c2 are unknown.
The error of the real yi-value compared to the straight line is declared as ei = yi - (c1 + c2x)
To make sure big errors have a more important impact on the computation, we consider the sum of the squared errors:
As we want to find the values c1 and c2 that will make sure S will be minimized, we set the partial derivates of S respective to c1 and c2 equal to zero:
Rearranging these equations we get:
Again rearranging we have:
To extract c1 and c2 we must only solve the system of two simultaneous equations.
Our example:
25c1 + 325c2 = 205.295045
325c1 + 5525c2 = 3268.12112
Using the determinant-procedure we get:
This technic may be generalyzed as:
y = c1 + c2x + c2x2 + ... + cm+1xm+1
if m=1 we fit the data in a straight line, if m=2 it is a parabola...
The coefficients are determined by solving the m+1 simultaneous equations:
Of course solving more simutaneous equations requires an adequate algorithm such as the Gauss-Jordan Elimination.