Python Pandas Series (十九)
Pandas series 是一个像数组一样的一维对象,可以存储很多类型的数据,例如数字或字符串。Pandas Series 和 NumPy ndarray 之间的主要区别之一是你可以为 Pandas Series 中的每个元素分配索引标签。换句话说,你可以为 Pandas Series 索引指定任何名称。Pandas Series 和 NumPy ndarrays 之间的另一个明显区别是 Pandas Series 可以存储不同类型的数据。
创建
我们先在 Python 中导入 Pandas。通常,我们使用 pd 导入 Pandas。因此,你可以在 Jupyter Notebook 中输入以下命令,导入 Pandas:
import pandas as pd
我们先创建一个 Pandas Series。你可以使用 pd.Series(data, index)命令创建 Pandas Series,其中 index 是一个索引标签列表。我们使用 Pandas Series 存储一个购物清单。我们将使用食品条目作为索引标签,使用购买数量作为数据。
# We import Pandas as pd into Python
import pandas as pd
# We create a Pandas Series that stores a grocery list
groceries = pd.Series(data = [30, 6, 'Yes', 'No'], index = ['eggs', 'apples', 'milk', 'bread'])
# We display the Groceries Pandas Series
groceries
打印:
eggs 30
apples 6
milk Yes
bread No
dtype: object
可以看出 Pandas Series 的显示方式为:第一列是索引,第二列是数据。注意,数据的索引不是从 0 到 3,而是采用我们设置的食品名称,即鸡蛋、苹果、等...此外注意,我们的 Pandas Series 中的数据既包括整数,又包括字符串。
和 NumPy ndarray 一样,通过 Pandas Series 的一些属性,我们可以轻松地获取 series 中的信息。我们来看一些属性:
# We print some information about Groceries
print('Groceries has shape:', groceries.shape)
print('Groceries has dimension:', groceries.ndim)
print('Groceries has a total of', groceries.size, 'elements')
Groceries has shape: (4,)
Groceries has dimension: 1
Groceries has a total of 4 elements
我们还可以单独输出 Pandas Series 的索引标签和数据。如果你不知道 Pandas Series 的索引标签是什么,这种方法就很有用。
We print the index and data of Groceries
print('The data in Groceries is:', groceries.values)
print('The index of Groceries is:', groceries.index)
打印:
The data in Groceries is: [30 6 'Yes' 'No']
The index of Groceries is: Index(['eggs', 'apples', 'milk', 'bread'], dtype='object')
如果你处理的是非常庞大的 Pandas Series,并且不清楚是否存在某个索引标签,可以使用 in 命令检查是否存在该标签:
# We check whether bananas is a food item (an index) in Groceries
x = 'bananas' in groceries
# We check whether bread is a food item (an index) in Groceries
y = 'bread' in groceries
# We print the results
print('Is bananas an index label in Groceries:', x)
print('Is bread an index label in Groceries:', y)
打印:
Is bananas an index label in Groceries: False
Is bread an index label in Groceries: True
访问和删除Series中的元素
.loc 和 .iloc
我们可以通过在方括号 [ ] 内添加索引标签或数字索引访问元素,就像访问 NumPy ndarray 中的元素一样。因为我们可以使用数字索引,因此可以使用正整数从 Series 的开头访问数据,或使用负整数从末尾访问。因为我们可以通过多种方式访问元素,为了清晰地表明我们指代的是索引标签还是数字索引,Pandas Series 提供了两个属性 .loc和 .iloc,帮助我们清晰地表明指代哪种情况。属性.loc表示 位置,用于明确表明我们使用的是标签索引。同样,属性.iloc表示整型位置,用于明确表明我们使用的是数字索引。我们来看一些示例:
# We access elements in Groceries using index labels:
# We use a single index label
print('How many eggs do we need to buy:', groceries['eggs'])
print()
# we can access multiple index labels
print('Do we need milk and bread:\n', groceries[['milk', 'bread']])
print()
# we use loc to access multiple index labels
print('How many eggs and apples do we need to buy:\n', groceries.loc[['eggs', 'apples']])
print()
# We access elements in Groceries using numerical indices:
# we use multiple numerical indices
print('How many eggs and apples do we need to buy:\n', groceries[[0, 1]])
print()
# We use a negative numerical index
print('Do we need bread:\n', groceries[[-1]])
print()
# We use a single numerical index
print('How many eggs do we need to buy:', groceries[0])
print()
# we use iloc to access multiple numerical indices
print('Do we need milk and bread:\n', groceries.iloc[[2, 3]])
打印:
How many eggs do we need to buy: 30
Do we need milk and bread:
milk Yes
bread No
dtype: object
How many eggs and apples do we need to buy:
eggs 30
apples 6
dtype: object
How many eggs and apples do we need to buy:
eggs 30
apples 6
dtype: object
Do we need bread:
bread No
dtype: object
How many eggs do we need to buy: 30
Do we need milk and bread:
milk Yes
bread No
dtype: object
和 NumPy ndarray 一样,Pandas Series 也是可变的,也就是说,创建好 Pandas Series 后,我们可以更改其中的元素。例如,我们更改下购物清单中的鸡蛋购买数量
# We display the original grocery list
print('Original Grocery List:\n', groceries)
# We change the number of eggs to 2
groceries['eggs'] = 2
# We display the changed grocery list
print()
print('Modified Grocery List:\n', groceries)
打印:
Original Grocery List:
eggs 30
apples 6
milk Yes
bread No
dtype: object
Modified Grocery List:
eggs 2
apples 6
milk Yes
bread No
dtype: object
我们还可以使用 .drop()方法删除 Pandas Series 中的条目。Series.drop(label) 方法会从给定 Series 中删除给定的 label。请注意,Series.drop(label)方法不在原地地从 Series 中删除元素,即不会更改被修改的原始 Series。我们来看看代码编写方式
# We display the original grocery list
print('Original Grocery List:\n', groceries)
# We remove apples from our grocery list. The drop function removes elements out of place
print()
print('We remove apples (out of place):\n', groceries.drop('apples'))
# When we remove elements out of place the original Series remains intact. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples out of place:\n', groceries)
打印:
Original Grocery List:
eggs 30
apples 6
milk Yes
bread No
dtype: object
We remove apples (out of place):
eggs 30
milk Yes
bread No
dtype: object
Grocery List after removing apples out of place:
eggs 30
apples 6
milk Yes
bread No
dtype: object
我们可以通过在.drop()方法中将关键字 inplace 设为True,原地地从 Pandas Series 中删除条目。我们来看一个示例:
# We display the original grocery list
print('Original Grocery List:\n', groceries)
# We remove apples from our grocery list in place by setting the inplace keyword to True
groceries.drop('apples', inplace = True)
# When we remove elements in place the original Series its modified. To see this
# we display our grocery list again
print()
print('Grocery List after removing apples in place:\n', groceries)
打印:
Original Grocery List:
eggs 30
apples 6
milk Yes
bread No
dtype: object
Grocery List after removing apples in place:
eggs 30
milk Yes
bread No
dtype: object
算术运算
单个数字之间的算术运算
# We create a Pandas Series that stores a grocery list of just fruits
fruits= pd.Series(data = [10, 6, 3,], index = ['apples', 'oranges', 'bananas'])
# We display the fruits Pandas Series
fruits
# print
apples 10
oranges 6
bananas 3
dtype: int64
我们现在可以通过执行基本的算术运算,修改 fruits 中的数据。我们来看一些示例:
# We print fruits for reference
print('Original grocery list of fruits:\n ', fruits)
# We perform basic element-wise operations using arithmetic symbols
print()
print('fruits + 2:\n', fruits + 2) # We add 2 to each item in fruits
print()
print('fruits - 2:\n', fruits - 2) # We subtract 2 to each item in fruits
print()
print('fruits * 2:\n', fruits * 2) # We multiply each item in fruits by 2
print()
print('fruits / 2:\n', fruits / 2) # We divide each item in fruits by 2
print()
打印:
Original grocery list of fruits:
apples 10
oranges 6
bananas 3
dtype: int64
fruits + 2:
apples 12
oranges 8
bananas 5
dtype: int64
fruits - 2:
apples 8
oranges 4
bananas 1
dtype: int64
fruits * 2:
apples 20
oranges 12
bananas 6
dtype: int64
fruits / 2:
apples 5.0
oranges 3.0
bananas 1.5
dtype: float64
我们还可以对 Pandas Series 中的所有元素应用 NumPy 中的数学函数,例如sqrt(x)。
# We import NumPy as np to be able to use the mathematical functions
import numpy as np
# We print fruits for reference
print('Original grocery list of fruits:\n', fruits)
# We apply different mathematical functions to all elements of fruits
print()
print('EXP(X) = \n', np.exp(fruits))
print()
print('SQRT(X) =\n', np.sqrt(fruits))
print()
print('POW(X,2) =\n',np.power(fruits,2)) # We raise all elements of fruits to the power of 2
打印:
Original grocery list of fruits:
apples 10
oranges 6
bananas 3
dtype: int64
Amount of bananas + 2 = 5
Amount of apples - 2 = 8
We double the amount of apples and oranges:
apples 20
oranges 12
dtype: int64
We half the amount of apples and oranges:
apples 5.0
oranges 3.0
dtype: float64
你还可以对具有混合数据类型的 Pandas Series 应用算术运算,前提是该算术运算适合 Series 中的所有数据类型,否则会出错。我们来看看将购物清单乘以 2 会发生什么
# We multiply our grocery list by 2
groceries * 2
# 打印:
eggs 60
apples 12
milk YesYes
bread NoNo
dtype: object
可以看出,在上述示例中,我们乘以了 2,Pandas 使每个条目的数据翻倍,包括字符串。Pandas 能够这么操作是因为,乘法运算 * 对数字和字符串来说都可行。如果你要应用对数字有效但是对字符串无效的运算,例如 /,则会出错。如果 Pandas Series 中有混合类型的数据,确保对于所有的元素数据类型,这些算术运算都有效。
练习
import pandas as pd
# Create a Pandas Series that contains the distance of some planets from the Sun.
# Use the name of the planets as the index to your Pandas Series, and the distance
# from the Sun as your data. The distance from the Sun is in units of 10^6 km
distance_from_sun = [149.6, 1433.5, 227.9, 108.2, 778.6]
planets = ['Earth','Saturn', 'Mars','Venus', 'Jupiter']
# Create a Pandas Series using the above data, with the name of the planets as
# the index and the distance from the Sun as your data.
dist_planets = pd.Series(data = distance_from_sun, index = planets)
# Calculate the number of minutes it takes sunlight to reach each planet. You can
# do this by dividing the distance from the Sun for each planet by the speed of light.
# Since in the data above the distance from the Sun is in units of 10^6 km, you can
# use a value for the speed of light of c = 18, since light travels 18 x 10^6 km/minute.
time_light = dist_planets / 18
# Use Boolean indexing to select only those planets for which sunlight takes less
# than 40 minutes to reach them.
close_planets = close_planets = time_light[time_light < 40]
# print(close_planets)
打印:
Earth 8.311111
Mars 12.661111
Venus 6.011111
dtype: float64
Printing The Closest Planets To the Sun
The correct answer is
Earth 8.311111
Mars 12.661111
Venus 6.011111
dtype: float64
And your code returned
Earth 8.311111
Mars 12.661111
Venus 6.011111
dtype: float64
Correct!
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)