Python Pandas Series (十九)

Pandas series 是一个像数组一样的一维对象,可以存储很多类型的数据,例如数字或字符串。Pandas Series 和 NumPy ndarray 之间的主要区别之一是你可以为 Pandas Series 中的每个元素分配索引标签。换句话说,你可以为 Pandas Series 索引指定任何名称。Pandas Series 和 NumPy ndarrays 之间的另一个明显区别是 Pandas Series 可以存储不同类型的数据。

创建

我们先在 Python 中导入 Pandas。通常,我们使用 pd 导入 Pandas。因此,你可以在 Jupyter Notebook 中输入以下命令,导入 Pandas:

import pandas as pd

我们先创建一个 Pandas Series。你可以使用 pd.Series(data, index)命令创建 Pandas Series,其中 index 是一个索引标签列表。我们使用 Pandas Series 存储一个购物清单。我们将使用食品条目作为索引标签,使用购买数量作为数据。

# We import Pandas as pd into Python
import pandas as pd

# We create a Pandas Series that stores a grocery list
groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])

# We display the Groceries Pandas Series
groceries

打印:

eggs           30
apples         6
milk         Yes
bread       No
dtype: object

可以看出 Pandas Series 的显示方式为:第一列是索引,第二列是数据。注意,数据的索引不是从 0 到 3,而是采用我们设置的食品名称,即鸡蛋、苹果、等...此外注意,我们的 Pandas Series 中的数据既包括整数,又包括字符串。

和 NumPy ndarray 一样,通过 Pandas Series 的一些属性,我们可以轻松地获取 series 中的信息。我们来看一些属性:

# We print some information about Groceries
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')
Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements

我们还可以单独输出 Pandas Series 的索引标签和数据。如果你不知道 Pandas Series 的索引标签是什么,这种方法就很有用。

 We print the index and data of Groceries
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)

打印:

The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')

如果你处理的是非常庞大的 Pandas Series,并且不清楚是否存在某个索引标签,可以使用 in 命令检查是否存在该标签:

# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries

# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries

# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)

打印:

Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True

访问和删除Series中的元素

.loc 和 .iloc

我们可以通过在方括号 [ ] 内添加索引标签或数字索引访问元素,就像访问 NumPy ndarray 中的元素一样。因为我们可以使用数字索引,因此可以使用正整数从 Series 的开头访问数据,或使用负整数从末尾访问。因为我们可以通过多种方式访问元素,为了清晰地表明我们指代的是索引标签还是数字索引,Pandas Series 提供了两个属性 .loc.iloc,帮助我们清晰地表明指代哪种情况。属性.loc表示 位置,用于明确表明我们使用的是标签索引。同样,属性.iloc表示整型位置,用于明确表明我们使用的是数字索引。我们来看一些示例:

# We access elements in Groceries using index labels:

# We use a single index label
print('How many eggs do we need to buy:', groceries['eggs'])
print()

# we can access multiple index labels
print('Do we need milk and bread:\n', groceries[['milk', 'bread']]) 
print()

# we use loc to access multiple index labels
print('How many eggs and apples do we need to buy:\n', groceries.loc[['eggs', 'apples']]) 
print()

# We access elements in Groceries using numerical indices:

# we use multiple numerical indices
print('How many eggs and apples do we need to buy:\n',  groceries[[0, 1]]) 
print()

# We use a negative numerical index
print('Do we need bread:\n', groceries[[-1]]) 
print()

# We use a single numerical index
print('How many eggs do we need to buy:', groceries[0]) 
print()
# we use iloc to access multiple numerical indices
print('Do we need milk and bread:\n', groceries.iloc[[2, 3]]) 

打印:

How many eggs do we need to buy: 30

Do we need milk and bread:
milk       Yes
bread     No
dtype: object

How many eggs and apples do we need to buy:
eggs       30
apples     6
dtype: object

How many eggs and apples do we need to buy:
eggs       30
apples     6
dtype: object

Do we need bread:
bread     No
dtype: object

How many eggs do we need to buy: 30

Do we need milk and bread:
milk       Yes
bread     No
dtype: object

和 NumPy ndarray 一样,Pandas Series 也是可变的,也就是说,创建好 Pandas Series 后,我们可以更改其中的元素。例如,我们更改下购物清单中的鸡蛋购买数量

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We change the number of eggs to 2
groceries['eggs'] = 2

# We display the changed grocery list
print()
print('Modified Grocery List:\n', groceries)

打印:

Original Grocery List:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

Modified Grocery List:
eggs             2
apples         6
milk         Yes
bread       No
dtype: object

我们还可以使用 .drop()方法删除 Pandas Series 中的条目。Series.drop(label) 方法会从给定 Series 中删除给定的 label。请注意,Series.drop(label)方法不在原地地从 Series 中删除元素,即不会更改被修改的原始 Series。我们来看看代码编写方式

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We remove apples from our grocery list. The drop function removes elements out of place
print()
print('We remove apples (out of place):\n', groceries.drop('apples'))

# When we remove elements out of place the original Series remains intact. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples out of place:\n', groceries)

打印:

Original Grocery List:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

We remove apples (out of place):
eggs           30
milk         Yes
bread       No
dtype: object

Grocery List after removing apples out of place:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

我们可以通过在.drop()方法中将关键字 inplace 设为True,原地地从 Pandas Series 中删除条目。我们来看一个示例:

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We remove apples from our grocery list in place by setting the inplace keyword to True
groceries.drop('apples', inplace = True)

# When we remove elements in place the original Series its modified. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples in place:\n', groceries)

打印:

Original Grocery List:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

Grocery List after removing apples in place:
eggs           30
milk         Yes
bread       No
dtype: object

算术运算

单个数字之间的算术运算

# We create a Pandas Series that stores a grocery list of just fruits
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])

# We display the fruits Pandas Series
fruits

# print
apples         10
oranges        6
bananas       3
dtype: int64

我们现在可以通过执行基本的算术运算,修改 fruits 中的数据。我们来看一些示例:

# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)

# We perform basic element-wise operations using arithmetic symbols
print()
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print()
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print()
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2 
print()
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2
print()

打印:

Original grocery list of fruits:
apples         10
oranges        6
bananas       3
dtype: int64

fruits + 2:
apples         12
oranges        8
bananas       5
dtype: int64

fruits - 2:
apples           8
oranges        4
bananas       1
dtype: int64

fruits * 2:
apples         20
oranges      12
bananas       6
dtype: int64

fruits / 2:
apples           5.0
oranges        3.0
bananas       1.5
dtype: float64

我们还可以对 Pandas Series 中的所有元素应用 NumPy 中的数学函数,例如sqrt(x)

# We import NumPy as np to be able to use the mathematical functions
import numpy as np

# We print fruits for reference
print('Original grocery list of fruits:\n', fruits)

# We apply different mathematical functions to all elements of fruits
print()
print('EXP(X) = \n', np.exp(fruits))
print() 
print('SQRT(X) =\n', np.sqrt(fruits))
print()
print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2

打印:

Original grocery list of fruits:
apples         10
oranges        6
bananas       3
dtype: int64

Amount of bananas + 2 = 5

Amount of apples - 2 = 8

We double the amount of apples and oranges:
apples         20
oranges      12
dtype: int64

We half the amount of apples and oranges:
apples         5.0
oranges      3.0
dtype: float64

你还可以对具有混合数据类型的 Pandas Series 应用算术运算,前提是该算术运算适合 Series 中的所有数据类型,否则会出错。我们来看看将购物清单乘以 2 会发生什么

# We multiply our grocery list by 2
groceries * 2

# 打印:
eggs                 60
apples             12
milk         YesYes
bread        NoNo
dtype: object

可以看出,在上述示例中,我们乘以了 2,Pandas 使每个条目的数据翻倍,包括字符串。Pandas 能够这么操作是因为,乘法运算 * 对数字和字符串来说都可行。如果你要应用对数字有效但是对字符串无效的运算,例如 /,则会出错。如果 Pandas Series 中有混合类型的数据,确保对于所有的元素数据类型,这些算术运算都有效。

练习

import pandas as pd

# Create a Pandas Series that contains the distance of some planets from the Sun.
# Use the name of the planets as the index to your Pandas Series, and the distance
# from the Sun as your data. The distance from the Sun is in units of 10^6 km

distance_from_sun = [149.6, 1433.5, 227.9, 108.2, 778.6]

planets = ['Earth','Saturn', 'Mars','Venus', 'Jupiter']

# Create a Pandas Series using the above data, with the name of the planets as
# the index and the distance from the Sun as your data.
dist_planets = pd.Series(data = distance_from_sun, index = planets)

# Calculate the number of minutes it takes sunlight to reach each planet. You can
# do this by dividing the distance from the Sun for each planet by the speed of light.
# Since in the data above the distance from the Sun is in units of 10^6 km, you can
# use a value for the speed of light of c = 18, since light travels 18 x 10^6 km/minute.
time_light = dist_planets / 18

# Use Boolean indexing to select only those planets for which sunlight takes less
# than 40 minutes to reach them.
close_planets = close_planets = time_light[time_light < 40]
# print(close_planets)

打印:

Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64
Printing The Closest Planets To the Sun
The correct answer is
Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64
And your code returned
Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64

Correct!

为者常成,行者常至