Date: May 19, 2024

Topic: Introduction to Regression

Recall

In regression, we want to find the best function that approximates our data points.

This is done by choosing a function that reduces the amount of squared errors between the approximation and the data points

Notes

Linear Regression

Regression

Linear Regression

We find a linear function that best fits actual data, minimizing the sum of squared errors

The green line represents the best fit line minimizing the data and function squared errors

The green line represents the best fit line minimizing the data and function squared errors


The best constant function can be found through using differentiation, where the rate of change at minimum error would be 0

Varying $k$ lets us use higher order functions that can fit the data better

However, if the order is too high the predictions become unrealistic

For this dataset, we see that the best degree is cubic as the predictions remain realistic while minimizing the errors.

Finding the best constant function ($c$)

Given a 2D relationship between $x$ and $y$, we use the sum of squared residuals as it can be differentiated

Hence, to minimize the error term, the change of error = 0

Thus, we get the best constant as $c=\sum^n_{i=1}\frac{yi}{n}$ (or the mean of the $y$ values)

Order of Polynomial

The general form of a function: $f{x} = c_0+c_1x+c_2x^2+...+c_kx^k$

Comparing Graphs

Untitled

Untitled

Comparing degrees with data



<aside> 📌 SUMMARY: We want to find the right degree which gives us the lowest error for our model against the data

</aside>