AI For Trading: Data Processing (5)

Tick data is sometimes referred to as heterogeneous data, since it is not sampled at regular time intervals, whereas Minute-level or End-of-Day data is called homogeneous. Converting heterogeneous data to a homogeneous form can be a good exercise!

Tick数据有时被称为异构数据,因为它不是以固定的时间间隔进行采样,而Minute-level或End-of-Day数据被称为同质数据。将异构数据转换为同类形式可能是一个很好的练习!

When to use time stamps

Assume that you are using minute-level stock data that includes a timestamp for each row, indicating the beginning of that minute. Let’s say the data spans a single month. In which of the following scenarios would you use these timestamps (check all that apply)?

假设您正在使用包含每行时间戳的分钟级库存数据,表示该分钟的开始。假设数据跨越一个月。您将在以下哪种情况下使用这些时间戳(检查所有适用的时间)?

Aggregating the volume of trades per day
Adjusting for gaps due to market closing and opening
根据市场关闭和开放调整差距 提交

Corporate Action:Stock Splits

Although a stock split shouldn’t theoretically affect the market cap of a stock, in reality it does! There are some intriguing behavioral patterns that researchers have observed among traders.
虽然股票分割在理论上不应影响股票的市值,但事实上它确实如此!研究人员在交易者中观察到一些有趣的行为模式。

One seems to suggest that after a stock splits, and the price drops considerably, people seem to think it is going to go back up to the previous price (double or triple)!
人们似乎认为,在股票分裂后,价格大幅下跌,人们似乎认为它会回到之前的价格(两倍或三倍)!

This creates an artificial demand for the stock, which in turn actually pushes up the price.
这造成了对股票的人为需求,这实际上推高了价格。

Technical Indicators

Moving-window or “rolling” statistics are typically calculated with respect to a past period.
通常根据过去的时期计算移动窗口或“滚动”统计数据。

Therefore, you won’t have a valid value at the beginning of the resulting time series, till you have one complete period.
因此,在结果时间序列的开头,您将没有有效值,直到您有一个完整的期间

For instance, when you compute the Simple Moving Average with a 1-month or 30-day window, the result is undefined for the first 29 days.
例如,当您使用1个月或30天的窗口计算简单移动平均线时,结果在前29天内未定义。

This is okay, and smart data analysis libraries like Pandas will mark these with a special “NA” or “nan” value (not zero, because that would be indiscernible from an actual zero value!). Subsequent processing or plotting will interpret those as missing data points.
这没关系,像Pandas这样的智能数据分析库会用特殊的“NA”或“nan”值标记这些库(不是零,因为从实际零值开始就难以辨认!)。后续处理或绘图将把这些解释为缺失数据点。

交易天数

How many trading days are there in a typical year for NYSE?
纽约证券交易所一年中有多少个交易日?
A:365 days
B:261 days
C:252 days
D:180 days

答案选:C

Yep! The NYSE and NASDAQ average about 252 trading days a year. This is from 365.25(days on average per year) * 5/7(proportion work days per week) = 260.89 - 9(holidays) = 251.89 ~ 252.

Trading Experiment

Experiment A: Randomly select a smattering of 100 stocks that are trading today, simulate buying them in 2005, or whenever they went public, investing equally in each, and hold on to them till the present day. Don’t try to apply any strategy, just pick stocks randomly!

随机选择今天交易的100只股票中的一小部分,模拟2005年买入它们,或者每当它们上市时,平均每次投资,并坚持到现在,不要尝试应用任何策略,只需随机挑选股票!

Experiment B: Randomly select another collection of 100 stocks, but this time, from those that were trading in 2005. Again, simulate buying them in 2005, investing uniformly, and hold on to them.
随机选择100个股票的另一个集合,但这一次,从那些2005年交易的股票。再次,模拟在2005年购买它们,统一投资,并坚持他们

Repeat these experiments multiple times and calculate the total return on your investment in each case. Now, would you expect the mean return for A to be significantly higher or lower than that of B? See if you can spot a clear difference.
多次重复这些实验,并计算每种情况下的投资总回报。现在,您是否期望A的平均回报率明显高于或低于B?看看你是否能发现明显的差异。

Would you expect the mean return for A to be significantly higher or lower than that of B?
您是否期望A的平均回报率明显高于或低于B?
A:Mean return from A would be higher.
B:Mean return from B would be higher.

答案:A
理由:You're right! The average return from Experiment A would indeed be higher than that from B. This is due to a phenomenon known as Survivor Bias, which is the subject of the next video!

为者常成,行者常至