In regression, we want to find the best function that approximates our data points.
This is done by choosing a function that reduces the amount of squared errors between the approximation and the data points
We find a linear function that best fits actual data, minimizing the sum of squared errors

The green line represents the best fit line minimizing the data and function squared errors
The best constant function can be found through using differentiation, where the rate of change at minimum error would be 0
Varying $k$ lets us use higher order functions that can fit the data better
However, if the order is too high the predictions become unrealistic
For this dataset, we see that the best degree is cubic as the predictions remain realistic while minimizing the errors.
Given a 2D relationship between $x$ and $y$, we use the sum of squared residuals as it can be differentiated
Hence, to minimize the error term, the change of error = 0
Thus, we get the best constant as $c=\sum^n_{i=1}\frac{yi}{n}$ (or the mean of the $y$ values)
The general form of a function: $f{x} = c_0+c_1x+c_2x^2+...+c_kx^k$


<aside> 📌 SUMMARY: We want to find the right degree which gives us the lowest error for our model against the data
</aside>