Date: January 17, 2024
Topic: Why data goes missing
Recall
Data might go missing in stocks for many reasons
Notes
Missing data in stocks

- Some stocks might have the same ticker (Sun Microsystems and Mr. Coffee)
- Companies may have been acquired (Sun Microsystems by Oracle)
- If stocks are thinly traded, there might be gaps
- We need to know how to handle missing data (nan data)
- Fill forward missing data
- Fill backwards if no previous data available
- Don’t peek into the future (interpolation)
Handling missing data
<aside>
💡 Don’t do interpolation as we cannot look into the future
</aside>
Fill forward all the time

- Take the last known price and join it to when data is available again
- Don’t do interpolation!!
Fill backwards only if there is no earlier data

- For missing data before the start of the stock, we can fill backwards from the first price
<aside>
📌 SUMMARY: Data can be missing for many reasons. We mainly use forward fill for missing data in-between, and backward fill for data missing before
</aside>
Date: January 17, 2024
Topic: pandas.fillna()
Recall
Fill missing data with either:
ffill
bfill
df.fillna(method=<METHOD>)
Notes
Filling nan values
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html

Forward fill and backward fill has been applied on FAKE2
Forward and Backward Fill
<aside>
💡 Perform forward fill, then backward fill
</aside>
# Forward fill
df.fillna(method="ffill", inplace=True) # inplace is used to save changes in df
# Backward fill
df.fillna(method="bfill", inplace=True)
<aside>
📌 SUMMARY: Fill nan values with forward fill first, then backward fill
</aside>