pandas.Grouper#

class pandas.Grouper(*args, **kwargs)[源代码]#

Grouper允许用户为对象指定GROUP BY指令。

该规范将通过键参数选择列，或者，如果给定了级别和/或轴参数，则选择目标对象的索引级别。

如果 axis 和/或 level 都作为关键字传递给 Grouper 和 groupby ，则传递给 Grouper 请优先考虑。

参数

key字符串，默认为无

Groupby键，用于选择目标的分组列。

level名称/编号，默认为无

目标指数的级别。

freq字符串/频率对象，默认为无

如果目标选择(通过键或级别)是类似DateTime的对象，这将按指定的频率分组。有关可用频率的完整规格，请参阅 here 。

axis字符串，int，默认为0

轴的编号/名称。

sort布尔默认为False

是否对生成的标签进行排序。

closed{‘左’或‘右’}

时间间隔结束结束。仅在以下情况下 freq 参数被传递。

label{‘左’或‘右’}

用于标注的间隔边界。仅在以下情况下 freq 参数被传递。

convention{‘开始’，‘结束’，‘e’，‘s’}

如果Grouper为PerodIndex，并且 freq 参数被传递。

base整型，默认为0

仅在以下情况下 freq 参数被传递。对于平均细分为1天的频率，为合计间隔的“原点”。例如，对于“5分钟”频率，基准的范围可以是0到4。缺省值为0。

1.1.0 版后已移除: 您应该使用的新参数是‘Offset’或‘Origin’。

loffset字符串、日期偏移量、时间增量对象

仅在以下情况下 freq 参数被传递。

1.1.0 版后已移除: Loffset仅适用于 .resample(...) 而不是为了Grouper (GH28302 )。但是，也不推荐使用loffset .resample(...) 请参见： DataFrame.resample

originTIMESTAMP或STR，默认为‘start_day’

调整分组所依据的时间戳。来源时区必须与索引的时区匹配。如果为字符串，则必须为以下值之一：

《纪元》： origin 是1970-01-01
‘开始’： origin 是时间序列的第一个价值
‘START_DAY’： origin 是时间系列片的第一天午夜

1.1.0 新版功能.

‘End’： origin 是时间序列的最后一个值
‘End_day’： origin 最后一天的天花板是午夜吗？

1.3.0 新版功能.

offsetTimedelta或str，默认为无

添加到原点的偏移时间增量。

1.1.0 新版功能.

dropna布尔值，默认为True

如果为True，并且组键包含NA值，则NA值与行/列一起将被删除。如果为False，则NA值也将被视为组中的关键字。

1.2.0 新版功能.

退货

分组指令的规范

示例

语法糖For df.groupby('A')

>>> df = pd.DataFrame(
...     {
...         "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
...         "Speed": [100, 5, 200, 300, 15],
...     }
... )
>>> df
   Animal  Speed
0  Falcon    100
1  Parrot      5
2  Falcon    200
3  Falcon    300
4  Parrot     15
>>> df.groupby(pd.Grouper(key="Animal")).mean()
        Speed
Animal
Falcon  200.0
Parrot   10.0

在‘发布日期’列上指定重采样操作

>>> df = pd.DataFrame(
...    {
...        "Publish date": [
...             pd.Timestamp("2000-01-02"),
...             pd.Timestamp("2000-01-02"),
...             pd.Timestamp("2000-01-09"),
...             pd.Timestamp("2000-01-16")
...         ],
...         "ID": [0, 1, 2, 3],
...         "Price": [10, 20, 30, 40]
...     }
... )
>>> df
  Publish date  ID  Price
0   2000-01-02   0     10
1   2000-01-02   1     20
2   2000-01-09   2     30
3   2000-01-16   3     40
>>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
               ID  Price
Publish date
2000-01-02    0.5   15.0
2000-01-09    2.0   30.0
2000-01-16    3.0   40.0

如果您想要根据固定的时间戳调整垃圾箱的起始位置：

>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7T, dtype: int64

>>> ts.groupby(pd.Grouper(freq='17min')).sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, dtype: int64

>>> ts.groupby(pd.Grouper(freq='17min', origin='epoch')).sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17T, dtype: int64

>>> ts.groupby(pd.Grouper(freq='17min', origin='2000-01-01')).sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17T, dtype: int64

如果要使用 offset Timedelta，以下两行是等价的：

>>> ts.groupby(pd.Grouper(freq='17min', origin='start')).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64

>>> ts.groupby(pd.Grouper(freq='17min', offset='23h30min')).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64

取代不推荐使用的 base 参数，您现在可以使用 offset 在本例中，它等效于具有 base=2 ：

>>> ts.groupby(pd.Grouper(freq='17min', offset='2min')).sum()
2000-10-01 23:16:00     0
2000-10-01 23:33:00     9
2000-10-01 23:50:00    36
2000-10-02 00:07:00    39
2000-10-02 00:24:00    24
Freq: 17T, dtype: int64

属性

ax
groups

pandas.core.groupby.GroupBy.get_group

pandas.core.groupby.GroupBy.apply