Guides

Data project #1: Stockmarket analysis

June 29, 2019

June 29, 2019

This tutorial will cover basic techniques for financial data analysis. Specifically, we will use the pandas library to import stock data and manipulate the data to identify an investment thesis. This tutorial is targeted at complete beginners, keep in mind that because of their simplicity these techniques won’t make you any money; they are designed to inspire you to investigate further.

Step 1: Installing prerequisites

Assuming you’re running the latest version of Python 3, we will need to install a couple of basic packages:

pip3 install numpy
pip3 install matplotlib
pip3 install pandas
pip3 install pandas-datareader
pip3 install beautifulsoup4
pip3 install sklearn
pip3 install quandl

If you would like to learn more about Matplotlib visit our tutorial on the subject here. Likewise we have a great tutorial on data analysis with pandas.

Step 2: Imports

To begin with we will need to do some imports of modules we will be using throughout the tutorial.

import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web

We'll be using date-time to work with dates, matplotlib will allow us to create graphs and pandas should let us manipulate data.

Step 3: Setup

Now lets prepare some variables and imports for manipulation later on.

style.use('ggplot')
start = dt.datetime(2018, 1, 1)
end = dt.datetime.now()

qkey = 'YOUR API KEY HERE'

df = web.DataReader('WIKI/GOOGL', 'quandl', start, end, access_key=qkey)

First things first our graphs should look good, style.use allows us to do this, ggplot is typically the better looking of the themes.

Next we create start and end variables, so we can easily call the same date-range in multiple contexts.

In order to get data from the Quandl API you would have to register on their website and get the API key first. You can register here, once registered confirm your email address and get your API key from 'account settings' and place it inside the qkey variable.

Finally, we use pandas DataReader to request Stockmarket data from the Quandl API, in this case we are requesting Google trading data since the beginning of the year (start) until present (end).

After running the above we should call the first 10 rows of the dataframe to confirm that everything is working correctly. So lets run:

df.head()

This should output table with the first 10 rows of information.

Step 4: Looking at the data

If you're new to the stockmarket and trading, this data may not be the easiest to understand. Below is a quick primer on what sort of inforamtin we're getting from the dataframe

  1. Open - The price of a share at the time when the stockmarket opens for morning trading.

  2. High - The highest value recorded during the trading day.

  3. Low - The lowest value recorded during the trading day.

  4. Close - The final price when the Stockmarket was closed.

  5. Volume - The number of shares traded for the day

  6. Adj. Open / Adj. High / Adj. Low / Adj. Close - Unlike the normal open, close, low, high, values the adjusted versions attempt to account for a stock split/splits in the ticker's history. Some companies can choose to do a stock split where they say every share is not worth two shares, splitting the value in half. Adjusted values are helpful as they account for stock splits, providing a price relative to the split. This is why you should mostly use adjusted values.

Step 5: Graphing the data

In order to start graphing the data all we need to do is run:

df.plot()
plt.show()

Unfortunately, since the Volume data is at a much greater scale than the other columns all we see is the volume graph.

But we can reference specific columns in the data frame by running:

df['AdjClose'].plot()
plt.show()

Now the result is much clearer and no key is displayed since we are only viewing one column. We can also graph multiple columns simultaneously:

df[['AdjOpen','AdjClose']].plot()
plt.show()

Conclusion

Thats it for the first part of our tutorial on Financial data with Python. Naturally, there is much more we could do here, but this should give you a solid foundation for the next tutorials in our series. Let me know what you thought of the tutorial in the comments and if you have any requests.

Appendix #1: Saving to CSV

We can also easily save the data to a csv, so as not to have to keep making API requests.

df.to_csv('google.csv')

Now instead of getting the data from an API every time, we could just use the .csv file instead.

df = pd.read_csv('google.csv', parse_dates=True, index_col=0)

Subscribe

Get fresh web design stories, tips, and resources delivered straight to your inbox every week.

Get fresh web design stories, tips, and resources delivered straight to your inbox every week.

Continue Reading

Apps

Timestripe - my new favourite productivity app

March 5, 2023

Guides

How to scrape tables from websites using Pandas read_html() function

February 2, 2023

Guides

Drop all duplicate rows across multiple columns in Python Pandas

January 28, 2023

Guides

How to create effective prompts for AI image generation

August 15, 2022

Guides

Generate Huge Datasets With Fake Data Easily and Quickly using Python and Faker

April 16, 2022

Guides

How to change or update a specific cell in Python Pandas Dataframe

March 25, 2021

Guides

How to add a row at the top in Pandas dataframe

March 22, 2021

Guides

Creating WordClouds in Python from a single-column in Pandas dataframe

November 15, 2020

Guides

Python Regex examples - How to use Regex with Pandas

September 9, 2020

Guides

Python regular expressions (RegEx) simple yet complete guide for beginners

September 15, 2020

Guides

8 Python Pandas Value_counts() tricks that make your work more efficient

May 31, 2020

Guides

Exploring Correlation in Python: Pandas, SciPy

May 5, 2020

Guides

How to add new columns to Pandas dataframe?

March 22, 2020

Guides

Delete column/row from a Pandas dataframe using .drop() method

February 2, 2020

Guides

How to visualize data with Matplotlib from a Pandas Dataframe

November 15, 2019

Guides

The ultimate beginners guide to Group by in Python Pandas

August 8, 2019

Guides

Guide to renaming columns with Python Pandas

July 2, 2019

Guides

How to suppress scientific notation in Pandas

July 12, 2019

Guides

The complete beginners guide to Pandas

June 29, 2019

Blue and red light digital wallpaper
Blue and red light digital wallpaper

Guides

Use Jupyter notebooks anywhere

June 10, 2019