Common operations with .csv includes reading, renaming, and dropping
.csv with pandashttps://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
To read from a typical stock market csv file:
pd.read_csv(
<FILE_PATH>,
index_col='Date',
parse_dates=True,
usecols=['Date', 'Adj Close'], # Columns to read
na_values=['nan'] # What to replace na_values with
)
We may want to rename certain columns to reflect the stock name. Do this with df.rename
df_temp = df_temp.rename(columns = {'Adj Close': symbol})
Furthermore, we can drop based on a column label with df.dropna
df = df.dropna(subset=["SPY"]) # Drop all columns where SPY is nan
left, right, outer, inner and cross joins. Default join is left
joins in dataframeshttps://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html
Use joins with dataframe.join(how='<JOIN>')
Quickly creating a date_range
date_rangehttps://pandas.pydata.org/docs/reference/api/pandas.date_range.html
pd.date_range(start_date, end_dateSlice with df.iloc, slice can be done on rows, columns or both
If we have a table:
| --- | --- | --- | --- | --- |
# Row and Col slicing
df.loc[start_date:end_date,['GOOG', 'GLD']] # where first 2 terms are rows, and last 2 are columns
# Row slicing only
df.loc[start_date:end_date]
# Col slicing only
df.loc[:, ['GOOG', 'GLD']]
# We use `loc` for labels and `iloc` for integer values
<aside> 📌 SUMMARY:
</aside>