Date: January 30, 2024
Topic: Evaluating KNNs and Linear Regression
Recall
Increasing $K$ for KNNs leads to less overfitting. However if it is too high then the model doesn’t predict accurately.
KNNs also cannot extrapolate
Notes
Evaluating KNNs
Predictions of $y$ as $x$ increases

- When $x$ increases, we get the above plot of the $y$ values for $K=3$
- While it closely follows the data, the areas with no data cannot be predicted (just straight lines at start and end)
- Hence, KNNs cannot extrapolate data
Varying $K$

- As $K$ increases, the predictions deviate from the data more. Hence having a high $K$ leads to under-fitting
Increasing the degree $d$ leads to more overfitting.
Linear regression models are able to extrapolate
Evaluating Linear Regression

- As the degree of polynomial $d$ increases, the prediction line fits the data more
- Hence increasing the degree leads to overfitting
- However, linear regression models are able to extrapolate as they don’t rely on data points afterwards to do predictions (only best fit)
<aside>
📌 SUMMARY: For KNNs, increasing $K$ leads to less overfitting. For linear regressions, increasing $d$ leads to more overfitting. KNNs cannot extrapolate while linear regressions can.
</aside>
Date: January 30, 2024
Topic: RMS Error
Recall
RMS error is obtained by taking the magnitude of the difference between prediction and actual $y$-values
Notes
Root Mean Square Error (RMSE)

- For each data point $y$-train, we subtract the prediction from the line $y$-predict (in-sample)
- Sum all the square of all these errors and divide by the number of points, then take the square root to get RMSE
In-sample vs Out-of-sample

- We take another partition of data (test) and run the same RMSE calculation (out-of-sample)
- This time, we subtract from $y$-test instead of $y$-train
- We expect the out-of-sample error to be larger than in-sample
Cross validation allows us to train and test on the same data in more ways.
As financial data is time dependent, we split our data but always make sure that test is ahead of train
<aside>
📌 SUMMARY: Cross validation is a method to expose our model to a wider variety of data while maintaining the same dataset
</aside>
Date: January 30, 2024
Topic: Correlation