AI For Trading:Alpha Factor Research Methods (75)
Case Studies intro
The lesson where we'll take a deeper look at four academic papers and see how we can code up Alpha factors from them.
 Read academic research
 Propose, implement, evaluate alpha factors
These skills and experiences will serve you well in preparing you for roles in quantitative finance .
The research papers we'll cover
Note that you'll have to create a free SSRN account in order to download these papers.
Overnight Returns and FirmSpecific Investor Sentiment
The Formation Process of Winners and Losers in Momentum Investing
Expected Skewness and Momentum
Arbitrage Asymmetry and the Idiosyncratic Volatility Puzzle
install libraries
Install python libraries for all code exercises in this lesson
Hi friends! The code exercises use zipline, and it takes 8 to 9 minutes to install the packages the first time you get to the first jupyter notebook. To help limit your wait on the installation during your first code exercise, please run the installation now in a separate tab. So, open this page in a new tab on your browser, and run the cell in the notebook below:
import sys
!{sys.executable} m pip install r requirements.txt
Continue on with the next video while it's installing. By the time you get to the first coding exercise, if the installation has finished, you'll be able to work on the exercise without waiting for the installation to complete. You should only have to wait once for the installation for all the coding exercises in this lesson. After that, your workspace should have all the libraries installed as you get to the later code exercises.
requirements.txt
alphalens==0.3.2
colour==0.1.5
cvxpy==1.0.3
cycler==0.10.0
numpy==1.14.5
pandas==0.18.1
plotly==2.2.3
pyparsing==2.2.0
pythondateutil==2.6.1
pytz==2017.3
requests==2.18.4
scipy==1.0.0
scikitlearn==0.19.1
six==1.11.0
tqdm==4.19.5
zipline===1.2.0
graphviz==0.9
Overnight Returns Abstract
一篇论文主要部分：
Major sections
 introduction
 Methodology
 Results
 Conclusion
Title: Overnight Returns and FirmSpecific Investor Sentiment Abstract Authors:
We examine the suitability of using overnight returns to measure firmspecific investor sentiment by analyzing whether they possess characteristics expected of a sentiment measure. We document shortterm overnight return persistence, consistent with existing evidence of short term persistence in share demand of sentimentinfluenced investors. We find that shortterm persistence is stronger for hardertovalue firms, consistent with existing evidence that sentiment plays a larger role for such firms. We show that stocks with high (low) overnight returns underperform (outperform) over the longerterm, consistent with prior evidence of temporary sentimentdriven mispricing. Overall, our evidence supports using overnight returns to measure firmspecific sentiment.
Keywords: investor sentiment, firmspecific, earnings announcements, overnight returns, closetoopen returns
Notes
p 2, I: The recent work of Berkman, Koch, Tuttle, and Zhang (2012) suggests that a stock’s overnight (closetoopen) return can serve as a measure of firmlevel sentiment.
p 3, I: Specifically, Berkman et al. (2012) find that attentiongenerating events (high absolute returns or strong net buying by retail investors) on one day lead to higher demand by individual investors, concentrated near the open of the next trading day...This creates temporary price pressure at the open, resulting in elevated overnight returns that are reversed during the trading day.
p 3, I: We conduct three sets of analyses. In the first we test for shortrun persistence in overnight returns. The basis for expecting this from a measure of sentiment is the evidence in Barber et al. (2009) that the order imbalances of retail investors, who are the investors most likely to exhibit sentiment, persist for periods extending over several weeks...In the third analysis we examine whether stocks with high overnight returns underperform those with low overnight returns over the long term.
Overnight Returns Possible Alpha Factors
We see that the authors refer to investor sentiment as the positive or negative views of investors, especially individuals who tend to cluster their orders around a market open.
The authors also define overnight returns as the close to open returns and describe the following hypothesis. Individual investors may notice attention getting events and may then choose to trade on those events. Since many of them have day jobs, they may place trades aftermarket clothes that would be executed the following morning.
The hypothesis continues to assume that these overnight trades may be subject to meanreversion by the middle of that second day.
Based on this, you may start thinking about how this can be useful as an alpha factor.
Let's say, a stock's overnight returns are high. So, that morning, you see an increase in a stock's open price relative to the previous day's close.
According to this hypothesis, the stock is overbought and we may expect a correction in a form of a drop in the price.So, if you calculate the overnight return of the stock, you can short the stock that same morning when the overnight return is high.
However, that's not the effect that we will look to capture here. There are ideas later in the paper that we will use to construct an alpha factor.
Also, in the introduction, the authors state that they test for shortrun persistence in overnight returns. This indicates some momentum in the overnight returns over a short term window.
If we continue reading in section two titled sample, variable definitions, descriptive statistics, we can see that they're interested in a window of one week or five trading days.
They calculate the average daily close to open return over a week as a measure of the persistence or momentum in sentiments. Again, based on this information, you can think about how this weekly average momentum factor might be used in a theoretical portfolio.
If according to the hypothesis, the weekly average close to open return for a stock is high and positive, we might want to buy more shares of that stock with the assumption that the positive returns will continue in the short run.
Finally, the abstract also identifies a third possible factor. With the hypothesis that in the longer term, stocks with high overnight returns under perform.
In other words, in a longer time window, stocks exhibit mean reversion. If we go to section five titled longerterm returns, the hypothesis based on other papers is that stocks that are more attractive to speculators and therefore see more near term positive returns may end up under performing over the next 12 months. However, this also is not the effect we will look to capture here.
Overnight Returns: Methods: Quantile Analysis
 Checking persistence of weekly closetoopen returns
 Using quantile analysis
 Decile 1: lowest returns
 Decile 10: highest returns

Potential factors
 Calculate overnight returns
 Aggregate weekly overnight returns
Momentum
 Overweight stocks with higher weekly overnight returns
 Underweight stocks with low weekly overnight returns
Overnight returns Exercise
Overnight Returns and FirmSpecific Investor Sentiment
Abtract: We explore the possibility that overnight returns can serve as a measure of firmspecific investor sentiment by analyzing whether they exhibit characteristics expected of a sentiment measure. First, we document shortterm persistence in overnight returns, consistent with existing evidence of shortterm persistence in share demand of sentimentinfluenced retail investors. Second, we find that shortterm persistence is stronger for hardertovalue firms, consistent with evidence that sentiment plays a larger role when there is less objective data available for valuation. Third, we show that stocks with high (low) overnight returns underperform (outperform) over the longerterm, consistent with evidence of temporary sentimentdriven mispricing.
p 2, I: The recent work of Berkman, Koch, Tuttle, and Zhang (2012) suggests that a stock鈥檚
overnight (closetoopen) return can serve as a measure of firmlevel sentiment.p 3, I: Specifically, Berkman et al. (2012) find that attentiongenerating events (high absolute returns or
strong net buying by retail investors) on one day lead to higher demand by individual investors,
concentrated near the open of the next trading day...This creates temporary price pressure at the
open, resulting in elevated overnight returns that are reversed during the trading day.p 3, I: We conduct three sets of analyses. In the first
we test for shortrun persistence in overnight returns. The basis for expecting this from a
measure of sentiment is the evidence in Barber et al. (2009) that the order imbalances of retail
investors, who are the investors most likely to exhibit sentiment, persist for periods extending
over several weeks...In the third analysis we
examine whether stocks with high overnight returns underperform those with low overnight
returns over the long term.
Install packages
import sys
!{sys.executable} m pip install r requirements.txt
import cvxpy as cvx
import numpy as np
import pandas as pd
import time
import os
import quiz_helper
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)
data bundle
import os
import quiz_helper
from zipline.data import bundles
os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')
ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)
bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)
print('Data Registered')
Build pipeline engine
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume
from zipline.utils.calendars import get_calendar
universe = AverageDollarVolume(window_length=120).top(500)
trading_calendar = get_calendar('NYSE')
bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)
engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)
View Data
With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model.
universe_end_date = pd.Timestamp('20160105', tz='UTC')
universe_tickers = engine\
.run_pipeline(
Pipeline(screen=universe),
universe_end_date,
universe_end_date)\
.index.get_level_values(1)\
.values.tolist()
universe_tickers
Get Returns data
from zipline.data.data_portal import DataPortal
data_portal = DataPortal(
bundle_data.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
equity_minute_reader=None,
equity_daily_reader=bundle_data.equity_daily_bar_reader,
adjustment_reader=bundle_data.adjustment_reader)
Get pricing data helper function
from quiz_helper import get_pricing
get pricing data into a dataframe
returns_df = \
get_pricing(
data_portal,
trading_calendar,
universe_tickers,
universe_end_date  pd.DateOffset(years=5),
universe_end_date)\
.pct_change()[1:].fillna(0) #convert prices into returns
returns_df
Sector data helper function
We'll create an object for you, which defines a sector for each stock. The sectors are represented by integers. We inherit from the Classifier class. Documentation for Classifier, and the source code for Classifier
from zipline.pipeline.classifiers import Classifier
from zipline.utils.numpy_utils import int64_dtype
class Sector(Classifier):
dtype = int64_dtype
window_length = 0
inputs = ()
missing_value = 1
def __init__(self):
self.data = np.load('../../data/project_4_sector/data.npy')
def _compute(self, arrays, dates, assets, mask):
return np.where(
mask,
self.data[assets],
self.missing_value,
)
sector = Sector()
We'll use 2 years of data to calculate the factor
Note: Going back 2 years falls on a day when the market is closed. Pipeline package doesn't handle start or end dates that don't fall on days when the market is open. To fix this, we went back 2 extra days to fall on the next day when the market is open.
factor_start_date = universe_end_date  pd.DateOffset(years=2, days=2)
factor_start_date
Walk through "Returns" class
We'll walk through how the Returns
class works, because we'll create a new class that inherits from Returns
in order to calculate a customized return.
Returns inherits from CustomFactor
The zipline package has a class zipline.pipeline.factors.Returns which inherits from class zipline.pipeline.CustomFactor. The source code for Returns is here, and the source code for CustomFactor is here.
Please open the links to the documentation and source code and follow along with our notes about the code
Inputs variable
The CustomFactor class takes the inputs
as a parameter of the constructor for the class, otherwise it looks for a classlevel variable named inputs
. inputs
takes a list of BoundColumn instances. These help us choose what kind of pricevolume data to use as input. The Returns
class sets this to
inputs = [USEquityPricing.close]
USEquityPricing class
The class USEquityPricing has a couple BoundColumn instances that we can choose from.
close = USEquityPricing.close
high = USEquityPricing.high
low = USEquityPricing.low
open = USEquityPricing.open
volume = USEquityPricing.volume
Quiz 1
If we wish to calculate close to open returns, which columns from USEquityPricing do you think we'll want to put into the list and set as inputs
?
Quiz 1 Answer
USEquityPricing.open
and USEquityPricing.close
window_length variable
The CustomFactor class takes window_length
(an integer) as a constructor parameter, otherwise it looks for a classlevel variable named window_length
. If we chose a window_length = 2
then this means that we'll be passing two days' worth of data (two rows) into the compute
function.
Quiz 2
What window length would you choose if you were calculating daily close to open returns? Assume we have daily data.
Answer 2
window length of 2 to have 2 days of data.
Compute function
The function definition of the Returns
class includes the compute
function
def compute(self, today, assets, out, close):
out[:] = (close[1]  close[0]) / close[0]
today
: this is handled by parent classes; it has the datetime for the "today" row for the given subset of data. We won't use it for this function implementation.assets
: this is handled by parent classes: it has the column header names for the "out" and "close". We won't use it for this function implementation.out
: this points to a numpy array that will store the result of our compute. It stores our "return" value of thecompute
function instead of explicitly returning a variable.*input
: a tuple of numpy arrays that contain input data that we'll use to compute a signal. In theReturns
definition ofcompute
, the input is a single valueclose
, but we can list more if we need additional columns of data to compute a return.
If we set the window_length=2
, then the compute
function gets two rows worth of data from close
. The index 1 value is the most recent value, and the index 0 value is the earliest in time. Recall that in Python, the 1 index is the same as getting the highest indexed value, so with a numpy array of just length two, 1 gives us the value at index 1.
So the line of code is calculating the oneday return using the close price, and storing that into the out
variable.
$ Return = \frac{close_1  close_0}{close_0} $
Quiz 3
Given a numpy array for open prices called open
and a numpy array for close prices called close
, what code would you write to get the most recent open price? Assume that you have 2 days of data.
Answer 3
open[1] or open[1] (because python lets us do reverse indexing) are valid answers. Use the 1 index allows us to get the most recent price (the very last index of the numpy array) regardless of the window length, so you may prefer to use 1 to make your code easier to maintain or modify.
Close To Open Returns (Overnight Returns)
The closetoopen return is the change in price between when the market closed on one day and when it opened on the next. So it's
$ CloseToOpen = \frac{open_1  close_0}{close_0}$
We'll now create a class CTO
that inherits from Return
, and override the compute
function.
Quiz 4
Create a customized class CloseToOpenReturns
that inherit from the Returns class. Define the compute function to calculate overnight returns.
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.factors import Returns
class CloseToOpenReturns(Returns):
"""
"""
# TODO: Set window_length (we're calculating daily returns)
window_length = 2
# TODO: set inputs
inputs = [USEquityPricing.open, USEquityPricing.close]
# The compute method is passed the current day, the assets list, a preallocated out vector, and the
# factor's items in the list `inputs`
def compute(self, today, assets, out, opens, closes):
#TODO: calculate closetoopen return and save into out[:]
out[:] = (opens[1]  closes[0]) / closes[0]
Trailing overnight returns
The cumulative overnight returns over a week may be predictive of future returns; hence it's a kind of momentum signal.
$ TrailingOvernightReturns = \sum_{1}^{Days}CloseToOpen_t$
Where $Days$ could be 5 if we are looking at a weekly window.
So we want to take the CloseToOpenReturns
as our input into another class, TrailingOvernightReturns
, which also inherits from Returns
.
mask
Note that we're going to create another class that inherits from Returns
. Recall that Returns
inherits from CustomFactor, which has a mask
parameter for its constructor. The mask
parameter takes in a Filter
object, which determines which stock series get passed to the compute
function. Note that when we used AverageDollarVolume
and stored its output in the variable universe
, this universe
variable is of type Filter
.
Quiz 5
If you wanted to create an object of type CloseToOpen, and also define the object so that it only computes returns on the set of stocks in universe that we selected earlier in this notebook, what code would you write?
Answer 5
We could instantiate a CloseToOpen object with CloseToOpenReturns(mask=universe)
, and this would only calculate close to open returns for the stocks defined in our universe
variable.
numpy.nansum
Numpy has a nansum
function that treat NaN (not a number) as zeros. Note that by default, if we give numpy.nansum a 2D numpy array, it will calculate a single sum across all rows and columns. For our purposes, we want to compute a sum over 5 days (5 rows), and each column has daily close to open returns for a single stock. It helps to think of the a matrix (2D numpy array) as a nested list of lists. This makes it easier to decide whether to set axis=0
or axis=1
.
tmp =
[
[stock1day1, stock2day1 ]
[stock1day2, stock2day2 ]
...
]
If we look at the outermost list, each element is a list that represents one day's worth of data. If we used np.nansum(tmp,axis=0)
, this would sum across the days for each stock. If we think of this as a 2D matrix, setting axis=0
is like calculating a sum for each column.
If we set axis=0
, this applies nansum
to the outermost list (axis 0), so that we end up with:
[
sum_of_stock_1, sum_of_stock_2
]
Alternatively, if we set axis=1
, this applies nansum
to the lists nested inside the outermost list. Each of these nested lists represent data for a single day, for all stocks, so that we get:
[
sum_of_day_1,
sum_of_day_2,
]
Example using numpy.nansum
tmp = np.array([
[1, 2, 3],
[np.nan, np.nan, np.nan],
[1, 1, 1]
])
print(f"Sum across rows and columns: numpy.nansum(tmp) \n{np.nansum(tmp)}")
print(f"Sum for each column: numpy.nansum(tmp,axis=0) \n{np.nansum(tmp,axis=0)}")
print(f"Sum for each row: numpy.nansum(tmp,axis=1) \n{np.nansum(tmp,axis=1)}")
Quiz 6
For our purposes, we want want a sum for each stock series. Which axis do you think we should choose?
Answer 6
We want to set axis = 0 so that we have a sum for each stock (each column).
Quiz 7
Create a class TrailingOvernightReturns that inherits from Returns and takes the cumulative weekly sum of overnight returns.
class TrailingOvernightReturns(Returns):
"""
Sum of trailing closetoopen returns; we expect sentiment persistence at short horizons, so we
look at the 5day (ie., 1 week) window
"""
# TODO: choose a window_length to calculate a weekly return
window_length = 5
# TODO: set inputs to a list containing the daily close to open returns
# Filter the close to open returns by our stock universe
inputs = [CloseToOpenReturns(mask=universe)]
def compute(self, today, assets, out, close_to_open):
#TODO: calculate the sum of close_to_open
#choose the axis so that there is a sum for each stock (each column)
#treat NaN as zeros
out[:] = np.nansum(close_to_open, axis=0)
Quiz 8
Create a factor by instantiating the TrailingOvernightReturns class that you just defined. Demean by sector, rank and covnert to a zscore.
# TODO: create an overnight_returns_factor variable
overnight_returns_factor = (
TrailingOvernightReturns().
demean(groupby=Sector()).
rank().
zscore()
)
# create a pipeline called p
p = Pipeline(screen=universe)
p.add(overnight_returns_factor, 'Overnight_Sentiment')
Visualize pipeline
p.show_graph(format='png')
run pipeline and view the factor data
df = engine.run_pipeline(p, factor_start_date, universe_end_date)
df.head()
Visualize factor returns
These are returns that a theoretical portfolio would have if its stock weights were determined by a single alpha factor's values.
from quiz_helper import make_factor_plot
make_factor_plot(df, data_portal, trading_calendar, factor_start_date, universe_end_date);
为者常成，行者常至
自由转载非商用非衍生保持署名（创意共享3.0许可证）