Python Pandas Series (十九)

2018-12-18 14:51:02 ⋅ 19846 ⋅ 0 ⋅ 0

Pandas series 是一个像数组一样的一维对象，可以存储很多类型的数据，例如数字或字符串。Pandas Series 和 NumPy ndarray 之间的主要区别之一是你可以为 Pandas Series 中的每个元素分配索引标签。换句话说，你可以为 Pandas Series 索引指定任何名称。Pandas Series 和 NumPy ndarrays 之间的另一个明显区别是 Pandas Series 可以存储不同类型的数据。

创建

我们先在 Python 中导入 Pandas。通常，我们使用 pd 导入 Pandas。因此，你可以在 Jupyter Notebook 中输入以下命令，导入 Pandas：

import pandas as pd

我们先创建一个 Pandas Series。你可以使用 pd.Series(data, index)命令创建 Pandas Series，其中 index 是一个索引标签列表。我们使用 Pandas Series 存储一个购物清单。我们将使用食品条目作为索引标签，使用购买数量作为数据。

# We import Pandas as pd into Python
import pandas as pd

# We create a Pandas Series that stores a grocery list
groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])

# We display the Groceries Pandas Series
groceries

打印：

eggs           30
apples         6
milk         Yes
bread       No
dtype: object

可以看出 Pandas Series 的显示方式为：第一列是索引，第二列是数据。注意，数据的索引不是从 0 到 3，而是采用我们设置的食品名称，即鸡蛋、苹果、等...此外注意，我们的 Pandas Series 中的数据既包括整数，又包括字符串。

和 NumPy ndarray 一样，通过 Pandas Series 的一些属性，我们可以轻松地获取 series 中的信息。我们来看一些属性：

# We print some information about Groceries
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')

Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements

我们还可以单独输出 Pandas Series 的索引标签和数据。如果你不知道 Pandas Series 的索引标签是什么，这种方法就很有用。

 We print the index and data of Groceries
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)

打印：

The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')

如果你处理的是非常庞大的 Pandas Series，并且不清楚是否存在某个索引标签，可以使用 in 命令检查是否存在该标签：

# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries

# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries

# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)

打印：

Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True

访问和删除Series中的元素

.loc 和 .iloc

我们可以通过在方括号 [ ] 内添加索引标签或数字索引访问元素，就像访问 NumPy ndarray 中的元素一样。因为我们可以使用数字索引，因此可以使用正整数从 Series 的开头访问数据，或使用负整数从末尾访问。因为我们可以通过多种方式访问元素，为了清晰地表明我们指代的是索引标签还是数字索引，Pandas Series 提供了两个属性 .loc和 .iloc，帮助我们清晰地表明指代哪种情况。属性.loc表示位置，用于明确表明我们使用的是标签索引。同样，属性.iloc表示整型位置，用于明确表明我们使用的是数字索引。我们来看一些示例：

# We access elements in Groceries using index labels:

# We use a single index label
print('How many eggs do we need to buy:', groceries['eggs'])
print()

# we can access multiple index labels
print('Do we need milk and bread:\n', groceries[['milk', 'bread']]) 
print()

# we use loc to access multiple index labels
print('How many eggs and apples do we need to buy:\n', groceries.loc[['eggs', 'apples']]) 
print()

# We access elements in Groceries using numerical indices:

# we use multiple numerical indices
print('How many eggs and apples do we need to buy:\n',  groceries[[0, 1]]) 
print()

# We use a negative numerical index
print('Do we need bread:\n', groceries[[-1]]) 
print()

# We use a single numerical index
print('How many eggs do we need to buy:', groceries[0]) 
print()
# we use iloc to access multiple numerical indices
print('Do we need milk and bread:\n', groceries.iloc[[2, 3]])

打印：

How many eggs do we need to buy: 30

Do we need milk and bread:
milk       Yes
bread     No
dtype: object

How many eggs and apples do we need to buy:
eggs       30
apples     6
dtype: object

How many eggs and apples do we need to buy:
eggs       30
apples     6
dtype: object

Do we need bread:
bread     No
dtype: object

How many eggs do we need to buy: 30

Do we need milk and bread:
milk       Yes
bread     No
dtype: object

和 NumPy ndarray 一样，Pandas Series 也是可变的，也就是说，创建好 Pandas Series 后，我们可以更改其中的元素。例如，我们更改下购物清单中的鸡蛋购买数量

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We change the number of eggs to 2
groceries['eggs'] = 2

# We display the changed grocery list
print()
print('Modified Grocery List:\n', groceries)

打印：

Original Grocery List:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

Modified Grocery List:
eggs             2
apples         6
milk         Yes
bread       No
dtype: object

我们还可以使用 .drop()方法删除 Pandas Series 中的条目。Series.drop(label) 方法会从给定 Series 中删除给定的 label。请注意，Series.drop(label)方法不在原地地从 Series 中删除元素，即不会更改被修改的原始 Series。我们来看看代码编写方式

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We remove apples from our grocery list. The drop function removes elements out of place
print()
print('We remove apples (out of place):\n', groceries.drop('apples'))

# When we remove elements out of place the original Series remains intact. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples out of place:\n', groceries)

打印：

Original Grocery List:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

We remove apples (out of place):
eggs           30
milk         Yes
bread       No
dtype: object

Grocery List after removing apples out of place:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

我们可以通过在.drop()方法中将关键字 inplace 设为True，原地地从 Pandas Series 中删除条目。我们来看一个示例：

# We display the original grocery list
print('Original Grocery List:\n', groceries)

# We remove apples from our grocery list in place by setting the inplace keyword to True
groceries.drop('apples', inplace = True)

# When we remove elements in place the original Series its modified. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples in place:\n', groceries)

打印：

Original Grocery List:
eggs           30
apples         6
milk         Yes
bread       No
dtype: object

Grocery List after removing apples in place:
eggs           30
milk         Yes
bread       No
dtype: object

算术运算

单个数字之间的算术运算

# We create a Pandas Series that stores a grocery list of just fruits
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])

# We display the fruits Pandas Series
fruits

# print
apples         10
oranges        6
bananas       3
dtype: int64

我们现在可以通过执行基本的算术运算，修改 fruits 中的数据。我们来看一些示例：

# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)

# We perform basic element-wise operations using arithmetic symbols
print()
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print()
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print()
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2 
print()
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2
print()

打印：

Original grocery list of fruits:
apples         10
oranges        6
bananas       3
dtype: int64

fruits + 2:
apples         12
oranges        8
bananas       5
dtype: int64

fruits - 2:
apples           8
oranges        4
bananas       1
dtype: int64

fruits * 2:
apples         20
oranges      12
bananas       6
dtype: int64

fruits / 2:
apples           5.0
oranges        3.0
bananas       1.5
dtype: float64

我们还可以对 Pandas Series 中的所有元素应用 NumPy 中的数学函数，例如sqrt(x)。

# We import NumPy as np to be able to use the mathematical functions
import numpy as np

# We print fruits for reference
print('Original grocery list of fruits:\n', fruits)

# We apply different mathematical functions to all elements of fruits
print()
print('EXP(X) = \n', np.exp(fruits))
print() 
print('SQRT(X) =\n', np.sqrt(fruits))
print()
print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2

打印：

Original grocery list of fruits:
apples         10
oranges        6
bananas       3
dtype: int64

Amount of bananas + 2 = 5

Amount of apples - 2 = 8

We double the amount of apples and oranges:
apples         20
oranges      12
dtype: int64

We half the amount of apples and oranges:
apples         5.0
oranges      3.0
dtype: float64

你还可以对具有混合数据类型的 Pandas Series 应用算术运算，前提是该算术运算适合 Series 中的所有数据类型，否则会出错。我们来看看将购物清单乘以 2 会发生什么

# We multiply our grocery list by 2
groceries * 2

# 打印：
eggs                 60
apples             12
milk         YesYes
bread        NoNo
dtype: object

可以看出，在上述示例中，我们乘以了 2，Pandas 使每个条目的数据翻倍，包括字符串。Pandas 能够这么操作是因为，乘法运算 * 对数字和字符串来说都可行。如果你要应用对数字有效但是对字符串无效的运算，例如 /，则会出错。如果 Pandas Series 中有混合类型的数据，确保对于所有的元素数据类型，这些算术运算都有效。

练习

import pandas as pd

# Create a Pandas Series that contains the distance of some planets from the Sun.
# Use the name of the planets as the index to your Pandas Series, and the distance
# from the Sun as your data. The distance from the Sun is in units of 10^6 km

distance_from_sun = [149.6, 1433.5, 227.9, 108.2, 778.6]

planets = ['Earth','Saturn', 'Mars','Venus', 'Jupiter']

# Create a Pandas Series using the above data, with the name of the planets as
# the index and the distance from the Sun as your data.
dist_planets = pd.Series(data = distance_from_sun, index = planets)

# Calculate the number of minutes it takes sunlight to reach each planet. You can
# do this by dividing the distance from the Sun for each planet by the speed of light.
# Since in the data above the distance from the Sun is in units of 10^6 km, you can
# use a value for the speed of light of c = 18, since light travels 18 x 10^6 km/minute.
time_light = dist_planets / 18

# Use Boolean indexing to select only those planets for which sunlight takes less
# than 40 minutes to reach them.
close_planets = close_planets = time_light[time_light < 40]
# print(close_planets)

打印：

Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64
Printing The Closest Planets To the Sun
The correct answer is
Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64
And your code returned
Earth     8.311111
Mars     12.661111
Venus     6.011111
dtype: float64

Correct!

为者常成，行者常至

Python Pandas Series (十九)

创建

访问和删除Series中的元素

.loc 和 .iloc

算术运算

单个数字之间的算术运算

练习

AI

作者：Corwien

专栏推荐

Python Pandas Series (十九)

创建

访问和删除Series中的元素

.loc 和 .iloc

算术运算

单个数字之间的算术运算

练习

添加附言

AI

作者：Corwien

专栏推荐