Date: January 16, 2024

Topic: Working with multiple stocks

Recall

Notes


Common operations with .csv includes reading, renaming, and dropping

Reading from .csv with pandas

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Reading

To read from a typical stock market csv file:

pd.read_csv(
   <FILE_PATH>, 
   index_col='Date', 
   parse_dates=True, 
   usecols=['Date', 'Adj Close'], # Columns to read
   na_values=['nan'] # What to replace na_values with
)

Renaming

We may want to rename certain columns to reflect the stock name. Do this with df.rename

df_temp = df_temp.rename(columns = {'Adj Close': symbol})

Dropping

Furthermore, we can drop based on a column label with df.dropna

df = df.dropna(subset=["SPY"]) # Drop all columns where SPY is nan

left, right, outer, inner and cross joins. Default join is left

Types of joins in dataframes

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

Use joins with dataframe.join(how='<JOIN>')


Quickly creating a date_range

Finding a date_range

https://pandas.pydata.org/docs/reference/api/pandas.date_range.html


Slice with df.iloc, slice can be done on rows, columns or both

Slicing dataframes

If we have a table:

| --- | --- | --- | --- | --- |

# Row and Col slicing
df.loc[start_date:end_date,['GOOG', 'GLD']] # where first 2 terms are rows, and last 2 are columns

# Row slicing only
df.loc[start_date:end_date]

# Col slicing only
df.loc[:, ['GOOG', 'GLD']]

# We use `loc` for labels and `iloc` for integer values


<aside> 📌 SUMMARY:

</aside>