Date: January 16, 2024

Topic: Working with multiple stocks

Recall

Notes

Common operations with .csv includes reading, renaming, and dropping

Reading from `.csv` with pandas

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Reading

To read from a typical stock market csv file:

pd.read_csv(
   <FILE_PATH>, 
   index_col='Date', 
   parse_dates=True, 
   usecols=['Date', 'Adj Close'], # Columns to read
   na_values=['nan'] # What to replace na_values with
)

Renaming

We may want to rename certain columns to reflect the stock name. Do this with df.rename

df_temp = df_temp.rename(columns = {'Adj Close': symbol})

Dropping

Furthermore, we can drop based on a column label with df.dropna

df = df.dropna(subset=["SPY"]) # Drop all columns where SPY is nan

left, right, outer, inner and cross joins. Default join is left

Types of `joins` in dataframes

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

left (default):
- use calling frame’s index (or column if on is specified)
right:
- use other’s index.
outer:
- form union of calling frame’s index (or column if on is specified) with other’s index, and sort it lexicographically.
inner:
- form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.
cross:
- creates the cartesian product from both frames, preserves the order of the left keys.

Use joins with dataframe.join(how='<JOIN>')

Quickly creating a date_range

Finding a `date_range`

https://pandas.pydata.org/docs/reference/api/pandas.date_range.html

Returns a range of equally spaced time points
pd.date_range(start_date, end_date

Slice with df.iloc, slice can be done on rows, columns or both

Slicing dataframes

If we have a table:

| --- | --- | --- | --- | --- |

To get only GOOG and GLD for a specific date range:

# Row and Col slicing
df.loc[start_date:end_date,['GOOG', 'GLD']] # where first 2 terms are rows, and last 2 are columns

# Row slicing only
df.loc[start_date:end_date]

# Col slicing only
df.loc[:, ['GOOG', 'GLD']]

# We use `loc` for labels and `iloc` for integer values

<aside> 📌 SUMMARY:

</aside>