AI For Trading: Exercise regression with two stocks (22)

Regression

Install packages

import sys
!{sys.executable} -m pip install -r requirements.txt
Collecting statsmodels==0.9.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/85/d1/69ee7e757f657e7f527cbf500ec2d295396e5bcec873cf4eb68962c41024/statsmodels-0.9.0-cp36-cp36m-manylinux1_x86_64.whl (7.4MB)
[K    100% |████████████████████████████████| 7.4MB 63kB/s  eta 0:00:01    23% |███████▋                        | 1.8MB 26.3MB/s eta 0:00:01
[?25hRequirement already satisfied: colour==0.1.5 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2))
Collecting numpy==1.14.5 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
[K    100% |████████████████████████████████| 12.2MB 38kB/s  eta 0:00:01   26% |████████▌                       | 3.2MB 29.8MB/s eta 0:00:01    50% |████████████████▏               | 6.2MB 30.5MB/s eta 0:00:01    62% |████████████████████            | 7.6MB 30.7MB/s eta 0:00:01    85% |███████████████████████████▍    | 10.4MB 29.9MB/s eta 0:00:01
[?25hCollecting pandas==0.21.1 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/3a/e1/6c514df670b887c77838ab856f57783c07e8760f2e3d5939203a39735e0e/pandas-0.21.1-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
[K    100% |████████████████████████████████| 26.2MB 16kB/s  eta 0:00:01   10% |███▍                            | 2.7MB 29.4MB/s eta 0:00:01    15% |████▉                           | 4.0MB 26.3MB/s eta 0:00:01    20% |██████▌                         | 5.4MB 28.8MB/s eta 0:00:01    25% |████████▏                       | 6.7MB 28.6MB/s eta 0:00:01    41% |█████████████▏                  | 10.8MB 27.0MB/s eta 0:00:01    66% |█████████████████████▏          | 17.4MB 25.2MB/s eta 0:00:01    70% |██████████████████████▋         | 18.5MB 24.4MB/s eta 0:00:01    77% |████████████████████████▊       | 20.3MB 22.6MB/s eta 0:00:01    81% |██████████████████████████      | 21.4MB 20.9MB/s eta 0:00:01    85% |███████████████████████████▎    | 22.4MB 21.4MB/s eta 0:00:01    97% |███████████████████████████████ | 25.5MB 22.6MB/s eta 0:00:01
[?25hCollecting plotly==2.2.3 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/99/a6/8214b6564bf4ace9bec8a26e7f89832792be582c042c47c912d3201328a0/plotly-2.2.3.tar.gz (1.1MB)
[K    100% |████████████████████████████████| 1.1MB 434kB/s eta 0:00:01
[?25hRequirement already satisfied: scikit-learn==0.19.1 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 6))
Requirement already satisfied: six==1.11.0 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 7))
Requirement already satisfied: patsy in /opt/conda/lib/python3.6/site-packages (from statsmodels==0.9.0->-r requirements.txt (line 1))
Requirement already satisfied: python-dateutil>=2 in /opt/conda/lib/python3.6/site-packages (from pandas==0.21.1->-r requirements.txt (line 4))
Requirement already satisfied: pytz>=2011k in /opt/conda/lib/python3.6/site-packages (from pandas==0.21.1->-r requirements.txt (line 4))
Requirement already satisfied: decorator>=4.0.6 in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: nbformat>=4.2 in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: jupyter-core in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: traitlets>=4.1 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Building wheels for collected packages: plotly
  Running setup.py bdist_wheel for plotly ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/98/54/81/dd92d5b0858fac680cd7bdb8800eb26c001dd9f5dc8b1bc0ba
Successfully built plotly
Installing collected packages: numpy, pandas, statsmodels, plotly
  Found existing installation: numpy 1.12.1
    Uninstalling numpy-1.12.1:
      Successfully uninstalled numpy-1.12.1
  Found existing installation: pandas 0.20.3
    Uninstalling pandas-0.20.3:
      Successfully uninstalled pandas-0.20.3
  Found existing installation: statsmodels 0.8.0
    Uninstalling statsmodels-0.8.0:
      Successfully uninstalled statsmodels-0.8.0
  Found existing installation: plotly 2.0.15
    Uninstalling plotly-2.0.15:
      Successfully uninstalled plotly-2.0.15
Successfully installed numpy-1.14.5 pandas-0.21.1 plotly-2.2.3 statsmodels-0.9.0
[33mYou are using pip version 9.0.1, however version 19.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
import pandas as pd
import numpy as np
import os
import helper
import quiz_tests
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)

Simulate two stock prices

# just set the seed for the random number generator
np.random.seed(100)
# use returns to create a price series
drift = 100
r0 = pd.Series(np.random.normal(0, 1, 1000))

s0 = pd.Series(np.cumsum(r0), name='s0') + drift

noise1 = np.random.normal(0, 0.4, 1000)
drift1 = 50
r1 = r0 + noise1
s1 = pd.Series(np.cumsum(r1), name='s1') + drift1

noise2 = np.random.normal(0, 0.4, 1000)
drift2 = 60
r2 = r0 + noise2
s2 = pd.Series(np.cumsum(r2), name='s2') + drift2

pd.concat([s1, s2], axis=1).plot(figsize=(15,6))
plt.show()

file

## Plot data with scatterplot
sc = plt.scatter(s2, s1, s=30, edgecolor='b', alpha=0.7)
plt.xlabel('s2')
plt.ylabel('s1');

file

from sklearn.linear_model import LinearRegression

Quiz: Linear Regression

Note that the LinearRegression().fit() expects 2D numpy arrays. Since s1 and s2 are pandas series, we can use Series.values to get the values as a numpy array. Since these are 1D arrays, we can use numpy.reshape(-1,1) to make these 1000 row by 1 column 2 dimensional arrays.

The coefficients of the linear regression, \(\beta\) and intercept for the regression line:
\(y = \beta \times x + intercept\)
Can be obtained after fitting to the data. Use LinearRegression.coef_ for the slope (beta coefficients) and LinearRegression.intercept_ for the intercept. You may want to practice accessing these outside of the function definition, to see if you'll need additional brackets [] to access the values.

def regression_slope_and_intercept(xSeries, ySeries):
    """
    xSeries: pandas series, x variable
    ySeries: pandas series, y variable
    """
    lr = LinearRegression()
    #TODO: get the values from each series, reshape to be 2 dimensional
    #set s1 to the x variable, s2 to the y variable
    xVar = xSeries.values.reshape(-1,1)
    yVar = ySeries.values.reshape(-1,1)

    #TODO: call LinearRegression.fit().  Pass in the x variable then y variable
    lr.fit(xVar,yVar);

    #TODO: obtain the slope and intercept
    slope = lr.coef_[0][0]
    intercept = lr.intercept_[0]

    return (slope, intercept)

quiz_tests.test_regression_slope_and_intercept(regression_slope_and_intercept);
Tests Passed
slope, intercept = regression_slope_and_intercept(s1,s2);
print(f"slope {slope:.2f} and intercept {intercept:.2f}")
slope 0.74 and intercept 30.62

Plot the fitted regression line over the scatterplot

plt.scatter(s2, s1, s=30, edgecolor='b', alpha=0.5);
x = np.linspace(s1.min()-5, s1.max()+5, 2)
yPred = slope * x + intercept
plt.plot(yPred,x, alpha=0.2, lw=3, color='r')
plt.xlabel('s2')
plt.ylabel('s1');

file

为者常成,行者常至