# AI For Trading:Alpha Factor Research Methods (75)

Case Studies intro

The lesson where we'll take a deeper look at four academic papers and see how we can code up Alpha factors from them.

• Propose, implement, evaluate alpha factors

These skills and experiences will serve you well in preparing you for roles in quantitative finance .

The research papers we'll cover
Note that you'll have to create a free SSRN account in order to download these papers.

Overnight Returns and Firm-Specific Investor Sentiment

The Formation Process of Winners and Losers in Momentum Investing

Expected Skewness and Momentum

Arbitrage Asymmetry and the Idiosyncratic Volatility Puzzle

## install libraries

### Install python libraries for all code exercises in this lesson

Hi friends! The code exercises use zipline, and it takes 8 to 9 minutes to install the packages the first time you get to the first jupyter notebook. To help limit your wait on the installation during your first code exercise, please run the installation now in a separate tab. So, open this page in a new tab on your browser, and run the cell in the notebook below:

import sys
!{sys.executable} -m pip install -r requirements.txt

Continue on with the next video while it's installing. By the time you get to the first coding exercise, if the installation has finished, you'll be able to work on the exercise without waiting for the installation to complete. You should only have to wait once for the installation for all the coding exercises in this lesson. After that, your workspace should have all the libraries installed as you get to the later code exercises.

requirements.txt

alphalens==0.3.2
colour==0.1.5
cvxpy==1.0.3
cycler==0.10.0
numpy==1.14.5
pandas==0.18.1
plotly==2.2.3
pyparsing==2.2.0
python-dateutil==2.6.1
pytz==2017.3
requests==2.18.4
scipy==1.0.0
scikit-learn==0.19.1
six==1.11.0
tqdm==4.19.5
zipline===1.2.0
graphviz==0.9


## Overnight Returns Abstract

Major sections

• introduction
• Methodology
• Results
• Conclusion

### Title: Overnight Returns and Firm-Specific Investor Sentiment Abstract Authors:

We examine the suitability of using overnight returns to measure firm-specific investor sentiment by analyzing whether they possess characteristics expected of a sentiment measure. We document short-term overnight return persistence, consistent with existing evidence of short- term persistence in share demand of sentiment-influenced investors. We find that short-term persistence is stronger for harder-to-value firms, consistent with existing evidence that sentiment plays a larger role for such firms. We show that stocks with high (low) overnight returns underperform (outperform) over the longer-term, consistent with prior evidence of temporary sentiment-driven mispricing. Overall, our evidence supports using overnight returns to measure firm-specific sentiment.

Keywords: investor sentiment, firm-specific, earnings announcements, overnight returns, close-to-open returns

### Notes

p 2, I: The recent work of Berkman, Koch, Tuttle, and Zhang (2012) suggests that a stock’s overnight (close-to-open) return can serve as a measure of firm-level sentiment.

p 3, I: Specifically, Berkman et al. (2012) find that attention-generating events (high absolute returns or strong net buying by retail investors) on one day lead to higher demand by individual investors, concentrated near the open of the next trading day...This creates temporary price pressure at the open, resulting in elevated overnight returns that are reversed during the trading day.

p 3, I: We conduct three sets of analyses. In the first we test for short-run persistence in overnight returns. The basis for expecting this from a measure of sentiment is the evidence in Barber et al. (2009) that the order imbalances of retail investors, who are the investors most likely to exhibit sentiment, persist for periods extending over several weeks...In the third analysis we examine whether stocks with high overnight returns underperform those with low overnight returns over the long term.

## Overnight Returns Possible Alpha Factors

We see that the authors refer to investor sentiment as the positive or negative views of investors, especially individuals who tend to cluster their orders around a market open.

The authors also define overnight returns as the close to open returns and describe the following hypothesis. Individual investors may notice attention getting events and may then choose to trade on those events. Since many of them have day jobs, they may place trades aftermarket clothes that would be executed the following morning.

The hypothesis continues to assume that these overnight trades may be subject to mean-reversion by the middle of that second day.

Based on this, you may start thinking about how this can be useful as an alpha factor.

Let's say, a stock's overnight returns are high. So, that morning, you see an increase in a stock's open price relative to the previous day's close.

According to this hypothesis, the stock is overbought and we may expect a correction in a form of a drop in the price.So, if you calculate the overnight return of the stock, you can short the stock that same morning when the overnight return is high.

However, that's not the effect that we will look to capture here. There are ideas later in the paper that we will use to construct an alpha factor.

Also, in the introduction, the authors state that they test for short-run persistence in overnight returns. This indicates some momentum in the overnight returns over a short term window.

If we continue reading in section two titled sample, variable definitions, descriptive statistics, we can see that they're interested in a window of one week or five trading days.

They calculate the average daily close to open return over a week as a measure of the persistence or momentum in sentiments. Again, based on this information, you can think about how this weekly average momentum factor might be used in a theoretical portfolio.

If according to the hypothesis, the weekly average close to open return for a stock is high and positive, we might want to buy more shares of that stock with the assumption that the positive returns will continue in the short run.

Finally, the abstract also identifies a third possible factor. With the hypothesis that in the longer term, stocks with high overnight returns under perform.

In other words, in a longer time window, stocks exhibit mean reversion. If we go to section five titled longer-term returns, the hypothesis based on other papers is that stocks that are more attractive to speculators and therefore see more near term positive returns may end up under performing over the next 12 months. However, this also is not the effect we will look to capture here.

## Overnight Returns: Methods: Quantile Analysis

• Checking persistence of weekly close-to-open returns
• Using quantile analysis
• Decile 1: lowest returns
• Decile 10: highest returns
• Potential factors

• Calculate overnight returns
• Aggregate weekly overnight returns

Momentum

• Overweight stocks with higher weekly overnight returns
• Underweight stocks with low weekly overnight returns

## Overnight returns Exercise

Overnight Returns and Firm-Specific Investor Sentiment

Abtract: We explore the possibility that overnight returns can serve as a measure of firm-specific investor sentiment by analyzing whether they exhibit characteristics expected of a sentiment measure. First, we document short-term persistence in overnight returns, consistent with existing evidence of short-term persistence in share demand of sentiment-influenced retail investors. Second, we find that short-term persistence is stronger for harder-to-value firms, consistent with evidence that sentiment plays a larger role when there is less objective data available for valuation. Third, we show that stocks with high (low) overnight returns underperform (outperform) over the longer-term, consistent with evidence of temporary sentiment-driven mispricing.

p 2, I: The recent work of Berkman, Koch, Tuttle, and Zhang (2012) suggests that a stock鈥檚
overnight (close-to-open) return can serve as a measure of firm-level sentiment.

p 3, I: Specifically, Berkman et al. (2012) find that attention-generating events (high absolute returns or
strong net buying by retail investors) on one day lead to higher demand by individual investors,
concentrated near the open of the next trading day...This creates temporary price pressure at the
open, resulting in elevated overnight returns that are reversed during the trading day.

p 3, I: We conduct three sets of analyses. In the first
we test for short-run persistence in overnight returns.
The basis for expecting this from a
measure of sentiment is the evidence in Barber et al. (2009) that the order imbalances of retail
investors, who are the investors most likely to exhibit sentiment, persist for periods extending
over several weeks...In the third analysis we
examine whether stocks with high overnight returns underperform those with low overnight
returns over the long term.

## Install packages

import sys
!{sys.executable} -m pip install -r requirements.txt
import cvxpy as cvx
import numpy as np
import pandas as pd
import time
import os
import quiz_helper
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)

### data bundle

import os
import quiz_helper
from zipline.data import bundles
os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')
ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)
bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)
print('Data Registered')

### Build pipeline engine

from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume
from zipline.utils.calendars import get_calendar

universe = AverageDollarVolume(window_length=120).top(500)
engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)

### View Data

With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model.

universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')

universe_tickers = engine\
.run_pipeline(
Pipeline(screen=universe),
universe_end_date,
universe_end_date)\
.index.get_level_values(1)\
.values.tolist()

universe_tickers

Get Returns data

from zipline.data.data_portal import DataPortal

data_portal = DataPortal(
bundle_data.asset_finder,
adjustment_reader=bundle_data.adjustment_reader)

## Get pricing data helper function

from quiz_helper import get_pricing

## get pricing data into a dataframe

returns_df = \
get_pricing(
data_portal,
universe_tickers,
universe_end_date - pd.DateOffset(years=5),
universe_end_date)\
.pct_change()[1:].fillna(0) #convert prices into returns

returns_df

## Sector data helper function

We'll create an object for you, which defines a sector for each stock. The sectors are represented by integers. We inherit from the Classifier class. Documentation for Classifier, and the source code for Classifier

from zipline.pipeline.classifiers import Classifier
from zipline.utils.numpy_utils import int64_dtype
class Sector(Classifier):
dtype = int64_dtype
window_length = 0
inputs = ()
missing_value = -1

def __init__(self):

def _compute(self, arrays, dates, assets, mask):
return np.where(
self.data[assets],
self.missing_value,
)
sector = Sector()

## We'll use 2 years of data to calculate the factor

Note: Going back 2 years falls on a day when the market is closed. Pipeline package doesn't handle start or end dates that don't fall on days when the market is open. To fix this, we went back 2 extra days to fall on the next day when the market is open.

factor_start_date = universe_end_date - pd.DateOffset(years=2, days=2)
factor_start_date

## Walk through "Returns" class

We'll walk through how the Returns class works, because we'll create a new class that inherits from Returns in order to calculate a customized return.

### Returns inherits from CustomFactor

The zipline package has a class zipline.pipeline.factors.Returns which inherits from class zipline.pipeline.CustomFactor. The source code for Returns is here, and the source code for CustomFactor is here.

### Inputs variable

The CustomFactor class takes the inputs as a parameter of the constructor for the class, otherwise it looks for a class-level variable named inputs. inputs takes a list of BoundColumn instances. These help us choose what kind of price-volume data to use as input. The Returns class sets this to

inputs = [USEquityPricing.close]

### USEquityPricing class

The class USEquityPricing has a couple BoundColumn instances that we can choose from.
close = USEquityPricing.close
high = USEquityPricing.high
low = USEquityPricing.low
open = USEquityPricing.open
volume = USEquityPricing.volume

## Quiz 1

If we wish to calculate close to open returns, which columns from USEquityPricing do you think we'll want to put into the list and set as inputs?

USEquityPricing.open and USEquityPricing.close

### window_length variable

The CustomFactor class takes window_length (an integer) as a constructor parameter, otherwise it looks for a class-level variable named window_length. If we chose a window_length = 2 then this means that we'll be passing two days' worth of data (two rows) into the compute function.

## Quiz 2

What window length would you choose if you were calculating daily close to open returns? Assume we have daily data.

window length of 2 to have 2 days of data.

### Compute function

The function definition of the Returns class includes the compute function

def compute(self, today, assets, out, close):
out[:] = (close[-1] - close) / close

• today: this is handled by parent classes; it has the datetime for the "today" row for the given subset of data. We won't use it for this function implementation.
• assets: this is handled by parent classes: it has the column header names for the "out" and "close". We won't use it for this function implementation.
• out: this points to a numpy array that will store the result of our compute. It stores our "return" value of the compute function instead of explicitly returning a variable.
• *input: a tuple of numpy arrays that contain input data that we'll use to compute a signal. In the Returns definition of compute, the input is a single value close, but we can list more if we need additional columns of data to compute a return.

If we set the window_length=2, then the compute function gets two rows worth of data from close. The index 1 value is the most recent value, and the index 0 value is the earliest in time. Recall that in Python, the -1 index is the same as getting the highest indexed value, so with a numpy array of just length two, -1 gives us the value at index 1.

So the line of code is calculating the one-day return using the close price, and storing that into the out variable.

$Return = \frac{close_1 - close_0}{close_0}$

## Quiz 3

Given a numpy array for open prices called open and a numpy array for close prices called close, what code would you write to get the most recent open price? Assume that you have 2 days of data.

open or open[-1] (because python lets us do reverse indexing) are valid answers. Use the -1 index allows us to get the most recent price (the very last index of the numpy array) regardless of the window length, so you may prefer to use -1 to make your code easier to maintain or modify.

## Close To Open Returns (Overnight Returns)

The close-to-open return is the change in price between when the market closed on one day and when it opened on the next. So it's

$CloseToOpen = \frac{open_1 - close_0}{close_0}$

We'll now create a class CTO that inherits from Return, and override the compute function.

## Quiz 4

Create a customized class CloseToOpenReturns that inherit from the Returns class. Define the compute function to calculate overnight returns.

from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.factors import Returns
class CloseToOpenReturns(Returns):
"""

"""

# TODO: Set window_length (we're calculating daily returns)
window_length = 2

# TODO: set inputs
inputs = [USEquityPricing.open, USEquityPricing.close]

# The compute method is passed the current day, the assets list, a pre-allocated out vector, and the
# factor's items in the list inputs
def compute(self, today, assets, out, opens, closes):
#TODO: calculate close-to-open return and save into out[:]
out[:] = (opens[-1] - closes) / closes

## Trailing overnight returns

The cumulative overnight returns over a week may be predictive of future returns; hence it's a kind of momentum signal.

$TrailingOvernightReturns = \sum_{1}^{Days}CloseToOpen_t$
Where $Days$ could be 5 if we are looking at a weekly window.

So we want to take the CloseToOpenReturns as our input into another class, TrailingOvernightReturns, which also inherits from Returns.

Note that we're going to create another class that inherits from Returns. Recall that Returns inherits from CustomFactor, which has a mask parameter for its constructor. The mask parameter takes in a Filter object, which determines which stock series get passed to the compute function. Note that when we used AverageDollarVolume and stored its output in the variable universe, this universe variable is of type Filter.

## Quiz 5

If you wanted to create an object of type CloseToOpen, and also define the object so that it only computes returns on the set of stocks in universe that we selected earlier in this notebook, what code would you write?

We could instantiate a CloseToOpen object with CloseToOpenReturns(mask=universe), and this would only calculate close to open returns for the stocks defined in our universe variable.

## numpy.nansum

Numpy has a nansum function that treat NaN (not a number) as zeros. Note that by default, if we give numpy.nansum a 2D numpy array, it will calculate a single sum across all rows and columns. For our purposes, we want to compute a sum over 5 days (5 rows), and each column has daily close to open returns for a single stock. It helps to think of the a matrix (2D numpy array) as a nested list of lists. This makes it easier to decide whether to set axis=0 or axis=1.

tmp =
[
[stock1day1, stock2day1 ]
[stock1day2, stock2day2 ]
...
]

If we look at the outermost list, each element is a list that represents one day's worth of data. If we used np.nansum(tmp,axis=0), this would sum across the days for each stock. If we think of this as a 2D matrix, setting axis=0 is like calculating a sum for each column.

If we set axis=0, this applies nansum to the outermost list (axis 0), so that we end up with:

[
sum_of_stock_1, sum_of_stock_2
]

Alternatively, if we set axis=1, this applies nansum to the lists nested inside the outermost list. Each of these nested lists represent data for a single day, for all stocks, so that we get:

[
sum_of_day_1,
sum_of_day_2,
]

## Example using numpy.nansum

tmp = np.array([
[1, 2, 3],
[np.nan, np.nan, np.nan],
[1, 1, 1]
])

print(f"Sum across rows and columns: numpy.nansum(tmp) \n{np.nansum(tmp)}")
print(f"Sum for each column: numpy.nansum(tmp,axis=0) \n{np.nansum(tmp,axis=0)}")
print(f"Sum for each row: numpy.nansum(tmp,axis=1) \n{np.nansum(tmp,axis=1)}")

## Quiz 6

For our purposes, we want want a sum for each stock series. Which axis do you think we should choose?

We want to set axis = 0 so that we have a sum for each stock (each column).

## Quiz 7

Create a class TrailingOvernightReturns that inherits from Returns and takes the cumulative weekly sum of overnight returns.

class TrailingOvernightReturns(Returns):
"""
Sum of trailing close-to-open returns; we expect sentiment persistence at short horizons, so we
look at the 5-day (ie., 1 week) window
"""
# TODO: choose a window_length to calculate a weekly return
window_length = 5

# TODO: set inputs to a list containing the daily close to open returns
# Filter the close to open returns by our stock universe

def compute(self, today, assets, out, close_to_open):
#TODO: calculate the sum of close_to_open
#choose the axis so that there is a sum for each stock (each column)
#treat NaN as zeros
out[:] = np.nansum(close_to_open, axis=0)

## Quiz 8

Create a factor by instantiating the TrailingOvernightReturns class that you just defined. Demean by sector, rank and covnert to a zscore.

# TODO: create an overnight_returns_factor variable
overnight_returns_factor = (
TrailingOvernightReturns().
demean(groupby=Sector()).
rank().
zscore()
)
# create a pipeline called p
p = Pipeline(screen=universe)
p.add(overnight_returns_factor, 'Overnight_Sentiment')

## Visualize pipeline

p.show_graph(format='png')

## run pipeline and view the factor data

df = engine.run_pipeline(p, factor_start_date, universe_end_date)
df.head()

## Visualize factor returns

These are returns that a theoretical portfolio would have if its stock weights were determined by a single alpha factor's values.

from quiz_helper import make_factor_plot
make_factor_plot(df, data_portal, trading_calendar, factor_start_date, universe_end_date);