>>> from env_helper import info; info()
页面更新时间: 2020-03-08 17:44:26
操作系统/OS: Linux-4.19.0-8-amd64-x86_64-with-debian-10.3 ;Python: 3.7.3
5.5. Pandas基本功能¶
到目前为止,我们了解了三种Pandas数据结构以及如何创建它们。接下来将主要关注数据帧(DataFrame)对象,因为它在实时数据处理中非常重要,并且还讨论其他数据结构。
系列基本功能¶
编号 | 属性或方法 | 描述 |
---|---|---|
1 | axes | 返回行轴标签列表。 |
2 | dtype | 返回对象的数据类型(dtype)。 |
3 | empty | 如果系列为空,则返回True。 |
4 | ndim | 返回底层数据的维数,默认定义:1。 |
5 | size | 返回基础数据中的元素数。 |
6 | values | 将系列作为ndarray返回。 |
7 | head() | 返回前n行。 |
8 | tail() | 返回最后n行。 |
现在创建一个系列并演示如何使用上面所有列出的属性操作。
示例¶
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 100 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print(s)
0 -0.386940
1 1.812800
2 0.512619
3 0.228608
dtype: float64
axes示例¶
返回系列的标签列表。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 100 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print ("The axes are:")
>>> print(s.axes)
The axes are:
[RangeIndex(start=0, stop=4, step=1)]
上述结果是从0到5的值列表的紧凑格式,即:[0,1,2,3,4]。
empty示例¶
>>> # 返回布尔值,表示对象是否为空。返回True则表示对象为空。
>>>
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 100 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print("Is the Object empty?")
>>> print(s.empty)
Is the Object empty?
False
ndim示例¶
返回对象的维数。根据定义,一个系列是一个1D数据结构,参考以下示例代码
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 4 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print(s)
>>>
>>> print("The dimensions of the object:")
>>> print(s.ndim)
0 0.245278
1 -0.797132
2 -0.616019
3 0.449783
dtype: float64
The dimensions of the object:
1
size示例¶
返回系列的大小(长度)。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 4 random numbers
>>> s = pd.Series(np.random.randn(2))
>>> print(s)
>>> print("The size of the object:")
>>> print(s.size)
0 -0.179602
1 2.994385
dtype: float64
The size of the object:
2
values示例¶
以数组形式返回系列中的实际数据值。
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 4 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print(s)
>>>
>>> print("The actual data series is:")
>>> print(s.values)
0 1.592711
1 1.221654
2 -0.214345
3 0.605158
dtype: float64
The actual data series is:
[ 1.59271081 1.22165418 -0.21434485 0.60515758]
head()和tail()方法示例¶
要查看Series或DataFrame对象的小样本,请使用head()和tail()方法。
head()返回前n行(观察索引值)。要显示的元素的默认数量为5,但可以传递自定义这个数字值。
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 4 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print("The original series is:")
>>> print(s)
>>>
>>> print("The first two rows of the data series:")
>>> print(s.head(2))
The original series is:
0 0.916933
1 -1.377796
2 1.280936
3 -0.432304
dtype: float64
The first two rows of the data series:
0 0.916933
1 -1.377796
dtype: float64
tail()返回最后n行(观察索引值)。 要显示的元素的默认数量为5,但可以传递自定义数字值。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 4 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print("The original series is:")
>>> print(s)
>>>
>>> print("The last two rows of the data series:")
>>> print(s.tail(2))
The original series is:
0 2.270088
1 -0.768469
2 1.221645
3 0.806462
dtype: float64
The last two rows of the data series:
2 1.221645
3 0.806462
dtype: float64
执行上面示例代码,得到以下结果 -
DataFrame基本功能¶
下面来看看数据帧(DataFrame)的基本功能有哪些?下表列出了DataFrame基本功能的重要属性或方法。
编号 | 属性或方法 | 描述 |
---|---|---|
1 | T | 转置行和列。 |
2 | axes | 返回一个列,行轴标签和列轴标签作为唯一的成员。 |
3 | dtypes | 返回此对象中的数据类型(dtypes)。 |
4 | empty | 如果NDFrame完全为空[无项目],则返回为True; 如果任何轴的长度为0。 |
5 | ndim | 轴/数组维度大小。 |
6 | shape | 返回表示DataFrame的维度的元组。 |
7 | size | NDFrame中的元素数。 |
8 | values | NDFrame的Numpy表示。 |
9 | head() | 返回开头前n行。 |
10 | tail() | 返回最后n行。 |
下面来看看如何创建一个DataFrame并使用上述属性和方法。
示例¶
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print("Our data series is:")
>>> print(df)
Our data series is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
T(转置)示例¶
返回DataFrame的转置。行和列将交换。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> # Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> # Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("The transpose of the data series is:")
>>> print (df.T)
The transpose of the data series is:
0 1 2 3 4 5 6
Name Tom James Ricky Vin Steve Minsu Jack
Age 25 26 25 23 30 29 23
Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8
axes示例¶
返回行轴标签和列轴标签列表。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Row axis labels and column axis labels are:")
>>> print( df.axes)
Row axis labels and column axis labels are:
[RangeIndex(start=0, stop=7, step=1), Index(['Name', 'Age', 'Rating'], dtype='object')]
dtypes示例¶
返回每列的数据类型。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print("The data types of each column are:")
>>> print(df.dtypes)
The data types of each column are:
Name object
Age int64
Rating float64
dtype: object
empty示例¶
返回布尔值,表示对象是否为空; 返回True表示对象为空。
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Is the object empty?")
>>> print( df.empty)
Is the object empty?
False
ndim示例¶
返回对象的维数。根据定义,DataFrame是一个2D对象。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The dimension of the object is:")
>>> print (df.ndim)
Our object is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
The dimension of the object is:
2
shape示例¶
返回表示DataFrame的维度的元组。 元组(a,b),其中a表示行数,b表示列数。
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The shape of the object is:")
>>> print (df.shape)
Our object is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
The shape of the object is:
(7, 3)
size示例¶
返回DataFrame中的元素数。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The total number of elements in our object is:")
>>> print (df.size)
Our object is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
The total number of elements in our object is:
21
values示例¶
将DataFrame中的实际数据作为NDarray返回。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The actual data in our data frame is:")
>>> print (df.values)
Our object is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
The actual data in our data frame is:
[['Tom' 25 4.23]
['James' 26 3.24]
['Ricky' 25 3.98]
['Vin' 23 2.56]
['Steve' 30 3.2]
['Minsu' 29 4.6]
['Jack' 23 3.8]]
head()和tail()示例¶
要查看DataFrame对象的小样本,可使用head()和tail()方法。head()返回前n行(观察索引值)。显示元素的默认数量为5,但可以传递自定义数字值。参考以下示例代码 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our data frame is:")
>>> print (df)
>>> print ("The first two rows of the data frame is:")
>>> print (df.head(2))
Our data frame is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
The first two rows of the data frame is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
tail()返回最后n行(观察索引值)。显示元素的默认数量为5,但可以传递自定义数字值。
>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>> 'Age':pd.Series([25,26,25,23,30,29,23]),
>>> 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our data frame is:")
>>> print (df)
>>> print ("The last two rows of the data frame is:")
>>> print (df.tail(2))
Our data frame is:
Name Age Rating
0 Tom 25 4.23
1 James 26 3.24
2 Ricky 25 3.98
3 Vin 23 2.56
4 Steve 30 3.20
5 Minsu 29 4.60
6 Jack 23 3.80
The last two rows of the data frame is:
Name Age Rating
5 Minsu 29 4.6
6 Jack 23 3.8