>>> from env_helper import info; info()
页面更新时间: 2024-01-19 23:25:15
运行环境:
    Linux发行版本: Debian GNU/Linux 12 (bookworm)
    操作系统内核: Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
    Python版本: 3.11.2

6.5. Pandas基本功能

到目前为止,我们了解了三种Pandas数据结构以及如何创建它们。接下来将主要关注数据帧(DataFrame)对象,因为它在实时数据处理中非常重要,并且还讨论其他数据结构。

6.5.1. 系列基本功能

编号

属性或方法

描述

1

axes

返回行轴标签列表。

2

dtype

返回对象的数据类型(dtype)。

3

empty

如果系列为空,则返回True。

4

ndim

返回底层数据的维数,默认定义:1。

5

size

返回基础数据中的元素数。

6

values

将系列作为ndarray返回。

7

head()

返回前n行。

8

tail()

返回最后n行。

现在创建一个系列并演示如何使用上面所有列出的属性操作。

示例

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 100 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print(s)
0   -0.773842
1    1.383208
2    1.232809
3    0.784548
dtype: float64

axes示例

返回系列的标签列表。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a series with 100 random numbers
>>> s = pd.Series(np.random.randn(4))
>>> print ("The axes are:")
>>> print(s.axes)
The axes are:
[RangeIndex(start=0, stop=4, step=1)]

上述结果是从0到5的值列表的紧凑格式,即:[0,1,2,3,4]。

empty示例

返回布尔值,表示对象是否为空。返回 True 则表示对象为空。

作为示例,创建100个随机数字的序列:

>>> s = pd.Series(np.random.randn(4))
>>> print("Is the Object empty?", s.empty)
Is the Object empty? False

ndim示例

返回对象的维数。根据定义,一个系列是一个1D数据结构,参考以下示例代码

#Create a series with 4 random numbers

>>>
>>> s = pd.Series(np.random.randn(4))
>>> print(s)
>>>
>>> print("The dimensions of the object:", s.ndim)
0    0.323513
1   -0.334400
2    0.807396
3   -0.618657
dtype: float64
The dimensions of the object: 1

size示例

返回系列的大小(长度)。参考以下示例代码 -

Create a series with 2 random numbers

>>> s = pd.Series(np.random.randn(2))
>>> print(s)
>>> print("The size of the object:", s.size)
0   -0.924024
1   -2.098308
dtype: float64
The size of the object: 2

values示例

以数组形式返回系列中的实际数据值。

Create a series with 4 random numbers.

>>>
>>> s = pd.Series(np.random.randn(4))
>>> print(s)
>>>
>>> print("The actual data series is:", s.values)
0   -1.776687
1   -0.957471
2   -0.213437
3   -0.279561
dtype: float64
The actual data series is: [-1.77668702 -0.95747061 -0.21343704 -0.27956121]

head()tail() 方法示例

要查看Series或DataFrame对象的小样本,请使用 head()tail() 方法。

head() 返回前n行(观察索引值)。要显示的元素的默认数量为5,但可以传递自定义这个数字值。

Create a series with 4 random numbers

>>> s = pd.Series(np.random.randn(4))
>>> print("The original series is:")
>>> print(s)
>>>
>>> print("The first two rows of the data series:")
>>> print(s.head(2))
The original series is:
0    0.095068
1    0.998429
2    0.266191
3   -0.215365
dtype: float64
The first two rows of the data series:
0    0.095068
1    0.998429
dtype: float64

tail() 返回最后n行(观察索引值)。 要显示的元素的默认数量为 5 ,但可以传递自定义数字值。参考以下示例代码 -

Create a series with 4 random numbers。

>>> s = pd.Series(np.random.randn(4))
>>> print("The original series is:")
>>> print(s)
>>>
>>> print("The last two rows of the data series:")
>>> print(s.tail(2))
The original series is:
0    0.922678
1   -0.279250
2   -0.404947
3    0.997478
dtype: float64
The last two rows of the data series:
2   -0.404947
3    0.997478
dtype: float64

执行上面示例代码,得到以下结果 -

6.5.2. DataFrame基本功能

下面来看看数据帧(DataFrame)的基本功能有哪些?下表列出了DataFrame基本功能的重要属性或方法。

编号

属性或方法

描述

1

T

转置行和列。

2

axes

返回一个列,行轴标签和列轴标签作为唯一的成员。

3

dtypes

返回此对象中的数据类型(dtypes)。

4

empty

如果NDFrame完全为空[无项目],则返回为True; 如果任何轴的长度为0。

5

ndim

轴/数组维度大小。

6

shape

返回表示DataFrame的维度的元组。

7

size

NDFrame中的元素数。

8

values

NDFrame的Numpy表示。

9

head()

返回开头前n行。

10

tail()

返回最后n行。

下面来看看如何创建一个DataFrame并使用上述属性和方法。

示例

Create a Dictionary of series:

>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

Create a DataFrame:

>>> df = pd.DataFrame(d)
>>> print("Our data series is:")
>>> print(df)
Our data series is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Minsu   29    4.60
6   Jack   23    3.80

T(转置)示例

返回DataFrame的转置。行和列将交换。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> # Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> # Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("The transpose of the data series is:")
>>> print (df.T)
The transpose of the data series is:
           0      1      2     3      4      5     6
Name     Tom  James  Ricky   Vin  Steve  Minsu  Jack
Age       25     26     25    23     30     29    23
Rating  4.23   3.24   3.98  2.56    3.2    4.6   3.8

axes示例

返回行轴标签和列轴标签列表。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Row axis labels and column axis labels are:")
>>> print( df.axes)
Row axis labels and column axis labels are:
[RangeIndex(start=0, stop=7, step=1), Index(['Name', 'Age', 'Rating'], dtype='object')]

dtypes示例

返回每列的数据类型。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print("The data types of each column are:")
>>> print(df.dtypes)
The data types of each column are:
Name       object
Age         int64
Rating    float64
dtype: object

empty示例

返回布尔值,表示对象是否为空; 返回True表示对象为空。

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Is the object empty?")
>>> print( df.empty)
Is the object empty?
False

ndim示例

返回对象的维数。根据定义,DataFrame是一个2D对象。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The dimension of the object is:")
>>> print (df.ndim)
Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Minsu   29    4.60
6   Jack   23    3.80
The dimension of the object is:
2

shape示例

返回表示DataFrame的维度的元组。 元组(a,b),其中a表示行数,b表示列数。

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The shape of the object is:")
>>> print (df.shape)
Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Minsu   29    4.60
6   Jack   23    3.80
The shape of the object is:
(7, 3)

size示例

返回DataFrame中的元素数。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The total number of elements in our object is:")
>>> print (df.size)
Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Minsu   29    4.60
6   Jack   23    3.80
The total number of elements in our object is:
21

values示例

将DataFrame中的实际数据作为NDarray返回。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our object is:")
>>> print (df)
>>> print ("The actual data in our data frame is:")
>>> print (df.values)
Our object is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Minsu   29    4.60
6   Jack   23    3.80
The actual data in our data frame is:
[['Tom' 25 4.23]
 ['James' 26 3.24]
 ['Ricky' 25 3.98]
 ['Vin' 23 2.56]
 ['Steve' 30 3.2]
 ['Minsu' 29 4.6]
 ['Jack' 23 3.8]]

head()和tail()示例

要查看DataFrame对象的小样本,可使用head()和tail()方法。head()返回前n行(观察索引值)。显示元素的默认数量为5,但可以传递自定义数字值。参考以下示例代码 -

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our data frame is:")
>>> print (df)
>>> print ("The first two rows of the data frame is:")
>>> print (df.head(2))
Our data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Minsu   29    4.60
6   Jack   23    3.80
The first two rows of the data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24

tail()返回最后n行(观察索引值)。显示元素的默认数量为5,但可以传递自定义数字值。

>>> import pandas as pd
>>> import numpy as np
>>>
>>> #Create a Dictionary of series
>>> d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
>>>    'Age':pd.Series([25,26,25,23,30,29,23]),
>>>    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
>>>
>>> #Create a DataFrame
>>> df = pd.DataFrame(d)
>>> print ("Our data frame is:")
>>> print (df)
>>> print ("The last two rows of the data frame is:")
>>> print (df.tail(2))
Our data frame is:
    Name  Age  Rating
0    Tom   25    4.23
1  James   26    3.24
2  Ricky   25    3.98
3    Vin   23    2.56
4  Steve   30    3.20
5  Minsu   29    4.60
6   Jack   23    3.80
The last two rows of the data frame is:
    Name  Age  Rating
5  Minsu   29     4.6
6   Jack   23     3.8