Python Pandas DataFrame 综合使用之股票信息统计 (二十四)

股票数据统计信息

在此 Lab 中,我们将在 Pandas DataFrame 中加载股票数据,并计算一些统计指标。我们将处理 Google、Apple 和 Amazon 的股票数据。所有数据都来自雅虎金融,格式为 CSV。在你的 workspace 中应该有一个包含 Google 股票数据的 GOOG.csv 文件,一个包含 Apple 股票数据的 AAPL.csv 文件和一个包含 Amazon 股票数据的 AMZN.csv 文件。所有文件都包含 7 列数据:

Date Open High Low Close Adj_Close Volume

首先,我们会将上述 CSV 文件读取到 DataFrame 中,看一下数据看起来如何。

# We import pandas into Python
import pandas as pd

# We read in a stock data data file into a data frame and see what it looks like
df = pd.read_csv('./GOOG.csv')

# We display the first 5 rows of the DataFrame
df.head()
Date Open High Low Close Adj Close Volume
0 2004-08-19 49.676899 51.693783 47.669952 49.845802 49.845802 44994500
1 2004-08-20 50.178635 54.187561 49.925285 53.805050 53.805050 23005800
2 2004-08-23 55.017166 56.373344 54.172661 54.346527 54.346527 18393200
3 2004-08-24 55.260582 55.439419 51.450363 52.096165 52.096165 15361800
4 2004-08-25 52.140873 53.651051 51.604362 52.657513 52.657513 9257400
Adj Close
Date
2004-08-19 49.845802
2004-08-20 53.805050
2004-08-23 54.346527
2004-08-24 52.096165
2004-08-25 52.657513

现在,你需要将上述三个 DataFrame 连接到一起,创建一个新的 DataFrame,其中包含所有股票的所有 `Adj Close`。我们先创建一个空的 DataFrame,其中具有在 `2000-01-01` 和 `2016-12-31` 之间的行索引日历日期。我们首先将使用 `pd.date_range()` 函数创建日历日期,然后创建一个将这些日期当做行索引的 DataFrame:

```python

# We create calendar dates between '2000-01-01' and '2016-12-31' dates = pd.date_range('2000-01-01', '2016-12-31') print(dates)

# We create and empty DataFrame that uses the above dates as indices all_stocks = pd.DataFrame(index = dates)

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08', '2000-01-09', '2000-01-10', ... '2016-12-22', '2016-12-23', '2016-12-24', '2016-12-25', '2016-12-26', '2016-12-27', '2016-12-28', '2016-12-29', '2016-12-30', '2016-12-31'], dtype='datetime64[ns]', length=6210, freq='D')

```

# TODO

现在需要将单个 DataFrame `google_stock`、`apple_stock` 和 `amazon_stock` 连接成 `all_stocks` DataFrame。但是,在执行此操作之前,有必要更改三个 DataFrame 中的列名称。因为 `all_stocks` DataFrame 中的列标签必须是唯一的。因为三个 DataFrame 中的所有列都具有相同的名称 `Adj Close`,我们必须将它们更改为股票名称,然后再连接它们。在下面的空白处,将每个 DataFrame 的列标签 `Adj Close` 更改为相应股票的名称。为此,你可以使用 `pd.DataFrame.rename()` 函数。

```python

# Change the Adj Close column label to Google google_stock = google_stock.rename(columns = {'Adj Close': 'Google'})

# Change the Adj Close column label to Apple apple_stock = apple_stock.rename(columns = {'Adj Close': 'Apple'})

# Change the Adj Close column label to Amazon amazon_stock = amazon_stock.rename(columns = {'Adj Close': 'Amazon'})

```

你可以通过显示 DataFrame,检查是否正确地更改了列标签

```python

# We display the google_stock DataFrame google_stock.head()

```

Google
Date
2004-08-19 49.845802
2004-08-20 53.805050
2004-08-23 54.346527
2004-08-24 52.096165
2004-08-25 52.657513

现在列标签已经是唯一的,我们可以将单个 DataFrame 连接成 `all_stocks` DataFrame。为此,我们将使用 `dataframe.join()` 函数。函数 `dataframe1.join(dataframe2)` 将 `dataframe1` 与 `dataframe2`相连接。我们将逐个将每个 DataFrame 连接到 `all_stocks` DataFrame。请填写以下代码,以连接这些 DataFrame,我们已帮你完成第一次连接:

```python

# We join the Google stock to all_stocks all_stocks = all_stocks.join(google_stock)

# We join the Apple stock to all_stocks all_stocks = all_stocks.join(apple_stock)

# We join the Amazon stock to all_stocks all_stocks = all_stocks.join(amazon_stock)

```

你可以通过显示 `all_stocks` DataFrame 检查是否正确地连接了这些 DataFrame。

```python

# We display the google_stock DataFrame all_stocks.head()

```

Google Apple Amazon
2000-01-01 NaN NaN NaN
2000-01-02 NaN NaN NaN
2000-01-03 NaN 3.596616 89.3750
2000-01-04 NaN 3.293384 81.9375
2000-01-05 NaN 3.341579 69.7500

# 代办事项

在获取股票数据的统计信息之前,我们先检查下是否有任何 *NaN* 值。在下面的空白处,检查 `all_stocks` DataFrame 中是否有任何 *NaN* 值。如果有,请删除包含 *NaN* 值的所有行:

```python

# Check if there are any NaN values in the all_stocks dataframe all_stocks.isnull().any()

# Remove any rows that contain NaN values all_stocks.dropna(axis = 0)

```

Google Apple Amazon
2004-08-19 49.845802 1.973460 38.630001
2004-08-20 53.805050 1.979244 39.509998
2004-08-23 54.346527 1.997236 39.450001
2004-08-24 52.096165 2.053144 39.049999
2004-08-25 52.657513 2.123831 40.299999
2004-08-26 53.606342 2.227291 40.189999
2004-08-27 52.732029 2.207371 39.900002
2004-08-30 50.675404 2.192590 38.310001
2004-08-31 50.854240 2.216367 38.139999
2004-09-01 49.801090 2.304405 38.240002
2004-09-02 50.427021 2.291552 39.180000
2004-09-03 49.681866 2.263920 38.740002
2004-09-07 50.461796 2.297979 38.509998
2004-09-08 50.819469 2.335893 38.009998
2004-09-09 50.824436 2.294122 38.070000
2004-09-10 52.324677 2.305047 38.570000
2004-09-13 53.402668 2.287054 40.009998
2004-09-14 55.384777 2.280628 42.669998
2004-09-15 55.638126 2.261993 42.209999
2004-09-16 56.616764 2.335893 42.570000
2004-09-17 58.365391 2.386659 42.959999
2004-09-20 59.294346 2.423287 43.270000
2004-09-21 58.539257 2.442566 43.290001
2004-09-22 58.807514 2.372521 41.380001
2004-09-23 60.019630 2.395013 41.830002
2004-09-24 59.527828 2.396298 40.939999
2004-09-27 58.747902 2.411721 39.930000
2004-09-28 63.020115 2.444494 39.430000
2004-09-29 65.116478 2.485621 40.840000
2004-09-30 64.381264 2.490119 40.860001
... ... ... ...
2016-11-17 771.229980 108.598877 756.400024
2016-11-18 760.539978 108.707527 760.159973
2016-11-21 769.200012 110.357010 780.000000
2016-11-22 768.270020 110.426147 785.330017
2016-11-23 760.989990 109.863159 780.119995
2016-11-25 761.679993 110.416275 780.369995
2016-11-28 768.239990 110.198975 766.770020
2016-11-29 770.840027 110.090332 762.520020
2016-11-30 758.039978 109.161880 750.570007
2016-12-01 747.919983 108.144531 743.650024
2016-12-02 750.500000 108.549500 740.340027
2016-12-05 762.520020 107.769203 759.359985
2016-12-06 759.109985 108.598877 764.719971
2016-12-07 771.190002 109.665604 770.419983
2016-12-08 776.419983 110.742218 767.330017
2016-12-09 789.289978 112.549728 768.659973
2016-12-12 789.270020 111.907722 760.119995
2016-12-13 796.099976 113.774490 774.340027
2016-12-14 797.070007 113.774490 768.820007
2016-12-15 797.849976 114.396751 761.000000
2016-12-16 790.799988 114.544907 757.770020
2016-12-19 794.200012 115.206673 766.000000
2016-12-20 796.419983 115.512863 771.219971
2016-12-21 794.559998 115.621513 770.599976
2016-12-22 791.260010 114.860977 766.340027
2016-12-23 789.909973 115.088142 760.590027
2016-12-27 791.549988 115.819054 771.400024
2016-12-28 785.049988 115.325203 772.130005
2016-12-29 782.789978 115.295570 765.150024
2016-12-30 771.820007 114.396751 749.869995

3115 rows × 3 columns

你已经删除了任何 *NaN* 值,我们可以计算股价的一些基本统计指标了。请填充以下代码

```python

# Print the average stock price for each stock all_stocks.mean()

# Print the median stock price for each stock

# Print the standard deviation of the stock price for each stock

# Print the correlation between stocks

```

``` Google 347.420229 Apple 35.222976 Amazon 166.095436 dtype: float64

```

我们现在将了解如何计算滚动统计指标,也称之为移动统计指标。例如,我们可以通过使用 Pandas `dataframe.rolling().mean()` 方法,计算 Google 股价的滚动均值(移动平均值)。`dataframe.rolling(N).mean()` 会计算 `N` 天期限的滚动均值。换句话说,我们可以通过使用上述方法查看每隔 `N` 天的平均股价。请填写以下代码,计算 Google 股票每隔 150 天的平均股价。

```python

# We compute the rolling mean using a 150-Day window for Google stock rollingMean = all_stocks['Google'].rolling(150).mean() # print(rollingMean)

```

我们还可以通过绘制 DataFrame 中的数据可视化滚动均值。在下面的课程中,你将学习如何使用 **Matplotlib** 可视化数据。暂时我们将导入 matplotlib 并在滚动均值上方绘制 Google 股票数据。你可以更改滚动均值期限,看看图形有何变化。

```python

# We import matplotlib into Python import matplotlib.pyplot as plt

# We plot the Google stock data plt.plot(all_stocks['Google'])

# We plot the rolling mean ontop of our Google stock data plt.plot(rollingMean) plt.legend(['Google Stock Price', 'Rolling Mean']) plt.show()

```

![file](http://digtime.cn/uploads/images/201812/21/1/7SA5aAjbaF.png)

为者常成,行者常至