AI For Trading: Exercise-Normality (17)
Testing if a Distribution is Normal
imports
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import quiz_tests
# Set plotting options
%matplotlib inline
plt.rc('figure', figsize=(16, 9))
Create normal and non-normal distributions
# Sample A: Normal distribution
sample_a = stats.norm.rvs(loc=0.0, scale=1.0, size=(1000,))
# Sample B: Non-normal distribution
sample_b = stats.lognorm.rvs(s=0.5, loc=0.0, scale=1.0, size=(1000,))
Boxplot-Whisker Plot and Histogram
We can visually check if a distribution looks normally distributed. Recall that a box whisker plot lets us check for symmetry around the mean. A histogram lets us see the overall shape. A QQ-plot lets us compare our data distribution with a normal distribution (or any other theoretical "ideal" distribution).
# Sample A: Normal distribution
sample_a = stats.norm.rvs(loc=0.0, scale=1.0, size=(1000,))
fig, axes = plt.subplots(2, 1, figsize=(16, 9), sharex=True)
axes[0].boxplot(sample_a, vert=False)
axes[1].hist(sample_a, bins=50)
axes[0].set_title("Boxplot of a Normal Distribution");

# Sample B: Non-normal distribution
sample_b = stats.lognorm.rvs(s=0.5, loc=0.0, scale=1.0, size=(1000,))
fig, axes = plt.subplots(2, 1, figsize=(16, 9), sharex=True)
axes[0].boxplot(sample_b, vert=False)
axes[1].hist(sample_b, bins=50)
axes[0].set_title("Boxplot of a Lognormal Distribution");

# Q-Q plot of normally-distributed sample
plt.figure(figsize=(10, 10)); plt.axis('equal')
stats.probplot(sample_a, dist='norm', plot=plt);

# Q-Q plot of non-normally-distributed sample
plt.figure(figsize=(10, 10)); plt.axis('equal')
stats.probplot(sample_b, dist='norm', plot=plt);

Testing for Normality
Shapiro-Wilk
The Shapiro-Wilk test is available in the scipy library. The null hypothesis assumes that the data distribution is normal. If the p-value is greater than the chosen p-value, we'll assume that it's normal. Otherwise we assume that it's not normal. https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.shapiro.html
def is_normal(sample, test=stats.shapiro, p_level=0.05, **kwargs):
"""Apply a normality test to check if sample is normally distributed."""
t_stat, p_value = test(sample, **kwargs)
print("Test statistic: {}, p-value: {}".format(t_stat, p_value))
print("Is the distribution Likely Normal? {}".format(p_value > p_level))
return p_value > p_level
# Using Shapiro-Wilk test (default)
print("Sample A:-"); is_normal(sample_a);
print("Sample B:-"); is_normal(sample_b);
打印答案:
Sample A:-
Test statistic: 0.9989532232284546, p-value: 0.8480865359306335
Is the distribution Likely Normal? True
Sample B:-
Test statistic: 0.8881349563598633, p-value: 2.8097752492750973e-26
Is the distribution Likely Normal? False
Kolmogorov-Smirnov
The Kolmogorov-Smirnov is available in the scipy.stats library. The K-S test compares the data distribution with a theoretical distribution. We'll choose the 'norm' (normal) distribution as the theoretical distribution, and we also need to specify the mean and standard deviation of this theoretical distribution. We'll set the mean and stanadard deviation of the theoretical norm with the mean and standard deviation of the data distribution.
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.kstest.html
Quiz
To use the Kolmogorov-Smirnov test, complete the function is_normal_ks.
To set the variable normal_args, create a tuple with two values. An example of a tuple is ("apple","banana") The first is the mean of the sample. The second is the standard deviation of the sample.
hint: Hint: Numpy has functions np.mean() and np.std()
def is_normal_ks(sample, test=stats.kstest, p_level=0.05, **kwargs):
"""
sample: a sample distribution
test: a function that tests for normality
p_level: if the test returns a p-value > than p_level, assume normality
return: True if distribution is normal, False otherwise
"""
normal_args = (np.mean(sample),np.std(sample))
t_stat, p_value = test(sample, 'norm', normal_args, **kwargs)
print("Test statistic: {}, p-value: {}".format(t_stat, p_value))
print("Is the distribution Likely Normal? {}".format(p_value > p_level))
return p_value > p_level
quiz_tests.test_is_normal_ks(is_normal_ks)
结果打印:
Test statistic: 0.025791255135429514, p-value: 0.51899666645066
Is the distribution Likely Normal? True
Test statistic: 0.10156342517501948, p-value: 1.9646047011434575e-09
Is the distribution Likely Normal? False
Tests Passed
# Using Kolmogorov-Smirnov test
print("Sample A:-"); is_normal_ks(sample_a);
print("Sample B:-"); is_normal_ks(sample_b);
结果打印:
Sample A:-
Test statistic: 0.013154199845478609, p-value: 0.9951742206115806
Is the distribution Likely Normal? True
Sample B:-
Test statistic: 0.10646539641835073, p-value: 2.523465880699405e-10
Is the distribution Likely Normal? False
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)