Date: January 30, 2024

Topic: Evaluating KNNs and Linear Regression

Recall

Increasing $K$ for KNNs leads to less overfitting. However if it is too high then the model doesn’t predict accurately.

KNNs also cannot extrapolate

Notes

Evaluating KNNs

Predictions of $y$ as $x$ increases

Untitled

When $x$ increases, we get the above plot of the $y$ values for $K=3$
While it closely follows the data, the areas with no data cannot be predicted (just straight lines at start and end)
- Hence, KNNs cannot extrapolate data

Varying $K$

Untitled

As $K$ increases, the predictions deviate from the data more. Hence having a high $K$ leads to under-fitting

Increasing the degree $d$ leads to more overfitting.

Linear regression models are able to extrapolate

Evaluating Linear Regression

Untitled

As the degree of polynomial $d$ increases, the prediction line fits the data more
Hence increasing the degree leads to overfitting
However, linear regression models are able to extrapolate as they don’t rely on data points afterwards to do predictions (only best fit)

<aside> 📌 SUMMARY: For KNNs, increasing $K$ leads to less overfitting. For linear regressions, increasing $d$ leads to more overfitting. KNNs cannot extrapolate while linear regressions can.

</aside>

Date: January 30, 2024

Topic: RMS Error

Recall

RMS error is obtained by taking the magnitude of the difference between prediction and actual $y$-values

Notes

Root Mean Square Error (RMSE)

Untitled

For each data point $y$-train, we subtract the prediction from the line $y$-predict (in-sample)
Sum all the square of all these errors and divide by the number of points, then take the square root to get RMSE

In-sample vs Out-of-sample

Untitled

We take another partition of data (test) and run the same RMSE calculation (out-of-sample)
This time, we subtract from $y$-test instead of $y$-train
We expect the out-of-sample error to be larger than in-sample

Cross validation allows us to train and test on the same data in more ways.

As financial data is time dependent, we split our data but always make sure that test is ahead of train

<aside> 📌 SUMMARY: Cross validation is a method to expose our model to a wider variety of data while maintaining the same dataset

</aside>

Date: January 30, 2024

Topic: Evaluating KNNs and Linear Regression

Recall

Notes

Evaluating KNNs

Predictions of $y$ as $x$ increases

Varying $K$

Evaluating Linear Regression

Date: January 30, 2024

Topic: RMS Error

Recall

Notes

Root Mean Square Error (RMSE)

In-sample vs Out-of-sample

Date: January 30, 2024

Topic: Correlation