AI For Trading: Exercise regression with two stocks (22)
Regression
Install packages
import sys
!{sys.executable} -m pip install -r requirements.txt
Collecting statsmodels==0.9.0 (from -r requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/85/d1/69ee7e757f657e7f527cbf500ec2d295396e5bcec873cf4eb68962c41024/statsmodels-0.9.0-cp36-cp36m-manylinux1_x86_64.whl (7.4MB)
[K 100% |████████████████████████████████| 7.4MB 63kB/s eta 0:00:01 23% |███████▋ | 1.8MB 26.3MB/s eta 0:00:01
[?25hRequirement already satisfied: colour==0.1.5 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2))
Collecting numpy==1.14.5 (from -r requirements.txt (line 3))
Downloading https://files.pythonhosted.org/packages/68/1e/116ad560de97694e2d0c1843a7a0075cc9f49e922454d32f49a80eb6f1f2/numpy-1.14.5-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
[K 100% |████████████████████████████████| 12.2MB 38kB/s eta 0:00:01 26% |████████▌ | 3.2MB 29.8MB/s eta 0:00:01 50% |████████████████▏ | 6.2MB 30.5MB/s eta 0:00:01 62% |████████████████████ | 7.6MB 30.7MB/s eta 0:00:01 85% |███████████████████████████▍ | 10.4MB 29.9MB/s eta 0:00:01
[?25hCollecting pandas==0.21.1 (from -r requirements.txt (line 4))
Downloading https://files.pythonhosted.org/packages/3a/e1/6c514df670b887c77838ab856f57783c07e8760f2e3d5939203a39735e0e/pandas-0.21.1-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
[K 100% |████████████████████████████████| 26.2MB 16kB/s eta 0:00:01 10% |███▍ | 2.7MB 29.4MB/s eta 0:00:01 15% |████▉ | 4.0MB 26.3MB/s eta 0:00:01 20% |██████▌ | 5.4MB 28.8MB/s eta 0:00:01 25% |████████▏ | 6.7MB 28.6MB/s eta 0:00:01 41% |█████████████▏ | 10.8MB 27.0MB/s eta 0:00:01 66% |█████████████████████▏ | 17.4MB 25.2MB/s eta 0:00:01 70% |██████████████████████▋ | 18.5MB 24.4MB/s eta 0:00:01 77% |████████████████████████▊ | 20.3MB 22.6MB/s eta 0:00:01 81% |██████████████████████████ | 21.4MB 20.9MB/s eta 0:00:01 85% |███████████████████████████▎ | 22.4MB 21.4MB/s eta 0:00:01 97% |███████████████████████████████ | 25.5MB 22.6MB/s eta 0:00:01
[?25hCollecting plotly==2.2.3 (from -r requirements.txt (line 5))
Downloading https://files.pythonhosted.org/packages/99/a6/8214b6564bf4ace9bec8a26e7f89832792be582c042c47c912d3201328a0/plotly-2.2.3.tar.gz (1.1MB)
[K 100% |████████████████████████████████| 1.1MB 434kB/s eta 0:00:01
[?25hRequirement already satisfied: scikit-learn==0.19.1 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 6))
Requirement already satisfied: six==1.11.0 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 7))
Requirement already satisfied: patsy in /opt/conda/lib/python3.6/site-packages (from statsmodels==0.9.0->-r requirements.txt (line 1))
Requirement already satisfied: python-dateutil>=2 in /opt/conda/lib/python3.6/site-packages (from pandas==0.21.1->-r requirements.txt (line 4))
Requirement already satisfied: pytz>=2011k in /opt/conda/lib/python3.6/site-packages (from pandas==0.21.1->-r requirements.txt (line 4))
Requirement already satisfied: decorator>=4.0.6 in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: nbformat>=4.2 in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: jupyter-core in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: traitlets>=4.1 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->plotly==2.2.3->-r requirements.txt (line 5))
Building wheels for collected packages: plotly
Running setup.py bdist_wheel for plotly ... [?25ldone
[?25h Stored in directory: /root/.cache/pip/wheels/98/54/81/dd92d5b0858fac680cd7bdb8800eb26c001dd9f5dc8b1bc0ba
Successfully built plotly
Installing collected packages: numpy, pandas, statsmodels, plotly
Found existing installation: numpy 1.12.1
Uninstalling numpy-1.12.1:
Successfully uninstalled numpy-1.12.1
Found existing installation: pandas 0.20.3
Uninstalling pandas-0.20.3:
Successfully uninstalled pandas-0.20.3
Found existing installation: statsmodels 0.8.0
Uninstalling statsmodels-0.8.0:
Successfully uninstalled statsmodels-0.8.0
Found existing installation: plotly 2.0.15
Uninstalling plotly-2.0.15:
Successfully uninstalled plotly-2.0.15
Successfully installed numpy-1.14.5 pandas-0.21.1 plotly-2.2.3 statsmodels-0.9.0
[33mYou are using pip version 9.0.1, however version 19.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
import pandas as pd
import numpy as np
import os
import helper
import quiz_tests
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)
Simulate two stock prices
# just set the seed for the random number generator
np.random.seed(100)
# use returns to create a price series
drift = 100
r0 = pd.Series(np.random.normal(0, 1, 1000))
s0 = pd.Series(np.cumsum(r0), name='s0') + drift
noise1 = np.random.normal(0, 0.4, 1000)
drift1 = 50
r1 = r0 + noise1
s1 = pd.Series(np.cumsum(r1), name='s1') + drift1
noise2 = np.random.normal(0, 0.4, 1000)
drift2 = 60
r2 = r0 + noise2
s2 = pd.Series(np.cumsum(r2), name='s2') + drift2
pd.concat([s1, s2], axis=1).plot(figsize=(15,6))
plt.show()

## Plot data with scatterplot
sc = plt.scatter(s2, s1, s=30, edgecolor='b', alpha=0.7)
plt.xlabel('s2')
plt.ylabel('s1');

from sklearn.linear_model import LinearRegression
Quiz: Linear Regression
Note that the LinearRegression().fit() expects 2D numpy arrays. Since s1 and s2 are pandas series, we can use Series.values to get the values as a numpy array. Since these are 1D arrays, we can use numpy.reshape(-1,1) to make these 1000 row by 1 column 2 dimensional arrays.
The coefficients of the linear regression, \(\beta\) and intercept for the regression line:
\(y = \beta \times x + intercept\)
Can be obtained after fitting to the data. Use LinearRegression.coef_ for the slope (beta coefficients) and LinearRegression.intercept_ for the intercept. You may want to practice accessing these outside of the function definition, to see if you'll need additional brackets [] to access the values.
def regression_slope_and_intercept(xSeries, ySeries):
"""
xSeries: pandas series, x variable
ySeries: pandas series, y variable
"""
lr = LinearRegression()
#TODO: get the values from each series, reshape to be 2 dimensional
#set s1 to the x variable, s2 to the y variable
xVar = xSeries.values.reshape(-1,1)
yVar = ySeries.values.reshape(-1,1)
#TODO: call LinearRegression.fit(). Pass in the x variable then y variable
lr.fit(xVar,yVar);
#TODO: obtain the slope and intercept
slope = lr.coef_[0][0]
intercept = lr.intercept_[0]
return (slope, intercept)
quiz_tests.test_regression_slope_and_intercept(regression_slope_and_intercept);
Tests Passed
slope, intercept = regression_slope_and_intercept(s1,s2);
print(f"slope {slope:.2f} and intercept {intercept:.2f}")
slope 0.74 and intercept 30.62
Plot the fitted regression line over the scatterplot
plt.scatter(s2, s1, s=30, edgecolor='b', alpha=0.5);
x = np.linspace(s1.min()-5, s1.max()+5, 2)
yPred = slope * x + intercept
plt.plot(yPred,x, alpha=0.2, lw=3, color='r')
plt.xlabel('s2')
plt.ylabel('s1');

为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)