The Mathematics Behind： Polynomial Curve Fitting (MATLAB)

In the series “The Mathematics Behind” I will explain mathematical concepts behind commonly used technologies. In this post, I will explain the mathematics behind polynomial curve fitting MATLAB.

First of all, what is polynomial curve fitting and where is it used for? Suppose we are trading on a stock market. The stock price is going up and down (see the figure) and we want to discover patterns in the price chances if any exists. Polynomial curve fitting tries to fit a model (here: a polynomial) on the given datapoints as good as possible.

Curve Fitting MATLAB: The Hidden Function

I will tell you a little secret. The “hidden” curve was generated by using the following function: $latex y = \sin(2 \pi x) + 20$ and a little noise was added, but our algorithm will not know this secret. The goal of the algorithm is to find this hidden structure. And that is what curve fitting is all about.

Model approximation

The model we will use to approximate this hidden function is a polynomial of degree 3. That means that the polynomial is of the form $latex w_0 + w_1 x + w_2 x^2 + w_3 x^3$. Notice that we have 4 degrees of freedom here since we can change the 4 weights in the model $latex w_0, w_1, w_2, w_3$.

Notice that the polynomial can be expressed compactly: $latex y(x,\textbf{w})=\sum\limits_{m=0}^3 w_m x^m$. Our goal is to make this function as close as possible to the true datapoints. Suppose we have $latex N$ datapoints where the $latex n^{th}$ datapoint is a pair $latex (x_n, t_n)$. There are many ways to define an error metric. One possibility is the Mean Squared Error (MSE) which is defined as follows: $latex E(\textbf{w}) = \frac{1}{2} \sum\limits_{n=1}^N (y(x_n,\textbf{w})-t_n)^2$.

Mathematical solution

In order to make the distance (the MSE) between the true datapoints and the polynomial as close as possible, we have to differentiate $latex E(\textbf{w})$. Note that $latex \frac{\partial E(\textbf{w})}{\partial w_i} = \sum\limits_{n=1}^N x^i (w_0 + w_1 x + w_2 x^2 + w_3 x^3 – t_n) = \sum\limits_{n=1}^N \sum\limits_{j=0}^3 w_j x^{i+j} – w_i t_n$. Now we can define some quantities, and we will do. These quantities seems weird at the first sight, but they will eventually solve the problem. Define: $latex A_{ij} = \sum\limits_{n=1}^N (x_n)^{i+j}$ and $latex T_i = \sum\limits_{n=1}^{N} (x_n)^i t_n$.

Now we have all the mathematical tools, we can solve this problem! First note that in order to minimize the MSE, we must have that all partial derivatives are zero: $latex \frac{\partial E(\textbf{w})}{\partial w_i} = 0$. From this, we derive: $latex \sum\limits_{n=1}^N \sum\limits_{j=0}^3 w_j x^{i+j} – w_i t_n = 0$ and this is the same as $latex \sum\limits_{j=0}^3 \sum\limits_{n=1}^N x_n^{i+j} w_j – \sum\limits_{n=1}^N t_n x^i = 0$ and here is where our weird quantities are needed, since this is equivalent to $latex \sum\limits_{j=0}^3 A_{ij} w_j – T_i = 0$ and this is equivalent to $latex \sum\limits_{j=0}^3 A_{ij}w_j = T_i$. In matrix form, this is equivalent to $latex \textbf{A} \textbf{w} = \textbf{T}$ and this can be solved easily by left-muliplying by $latex \textbf{A}^{-1}$: $latex \textbf{w} = \textbf{A}^{-1}T$.

Algorithm implementation

By implementing these quantities in MatLab, the following weights can be found: $latex \textbf{w} = (19.85, 11.22, -32.70, 21.72)$. In other words, the “hidden” structure is: $latex y(x) = 19.85 + 11.22 x – 32.70 x^2 + 21.72 x^3$, which is shown in the following image (and the result is surprisingly close to the function which we want to approximate!).

How can we bring the theory into practice? First, we need to calculate the weights, $latex \textbf{w}$ and this can be done by using the following piece of MatLab code.

With the found weights, the complete found structure is encoded.

Disadvantages

Are there disadvantages to the previously discussed method? Yes – The weights grow hard as the dimensionality (here: the degree of the polynomial) grows. This is solved by adding the size of the weight vector as additional error term and this technique is called shrinkage. Another problem is that the dimensionality is specified explicitely. There are other methods which do not have the dimensionality as parameter.

Applications

What are the applications now we have found a fitting curve? Now we can for example calculate the expected minimum and maximum of the dataset. We can also find out where the function is steep (and where not) and many more questions can be answered. Although some mathematics were required to find the answer, the answer is generalizable and works for specific datasets. Good luck with your curve fitting MATLAB projects! And if you have any questions, feel free to contact me.

Exercises

As an exercise for you: suppose we have two datapoints: $latex (0, 0)$ and $latex (2, 1)$. Find a polynomial of degree 1 ($latex w_0 + w_1 x$) which fits the data as good as possible. Implement it in your favorite programming language. (Hint: the answer is $latex 0 + \frac{1}{2} x$). If this was too easy for you: try to generate $latex N$ datapoints by a function you know and add some noise to it. Find a polynomial of degree $latex m$ that fits your data.

Bonus Material: Videos on Curve Fitting

If the concept is still not clear, please check out the following videos.