>>> from env_helper import info; info()
页面更新时间: 2023-07-09 19:06:41
运行环境:
Linux发行版本: Debian GNU/Linux 12 (bookworm)
操作系统内核: Linux-6.1.0-10-amd64-x86_64-with-glibc2.36
Python版本: 3.11.2
7.7. Pandas聚合¶
当有了滚动,扩展和ewm对象创建了以后,就有几种方法可以对数据执行聚合。
7.7.1. DataFrame应用聚合¶
让我们创建一个DataFrame并在其上应用聚合。
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2019', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>>
>>> print (df)
>>> print("=======================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r)
A B C D
2019-01-01 0.409221 0.366230 0.811195 -0.625772
2019-01-02 -0.797305 -0.167884 -0.747666 0.319435
2019-01-03 -0.224666 -0.755982 -0.018736 0.017640
2019-01-04 -0.541295 -0.878131 0.437026 0.268451
2019-01-05 -0.307103 -1.058285 2.348750 -0.399490
2019-01-06 0.448267 -0.487242 -1.544957 1.734672
2019-01-07 -1.226280 -1.205490 -0.555893 -0.950993
2019-01-08 0.018771 -0.388378 1.314139 -0.727108
2019-01-09 -0.722677 0.254524 -0.502587 -0.254569
2019-01-10 1.530029 0.839409 0.587075 -0.306621
=======================================
Rolling [window=3,min_periods=1,center=False,axis=0,method=single]
可以通过向整个DataFrame传递一个函数来进行聚合,或者通过标准的获取项目方法来选择一个列。 在整个数据框上应用聚合
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2000', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print(df)
>>>
>>> r = df.rolling(window=3,min_periods=1)
>>> print(r.aggregate(np.sum))
A B C D
2000-01-01 -1.191295 -2.159337 1.657109 -0.782601
2000-01-02 -0.419242 0.399465 0.381738 -0.212911
2000-01-03 1.052838 0.561114 -1.704994 0.503670
2000-01-04 0.493795 1.457303 -1.945349 0.488218
2000-01-05 0.056389 -0.168218 0.021157 1.903208
2000-01-06 -0.922407 -0.041264 -0.390733 0.184032
2000-01-07 2.167636 -1.502741 -0.854621 0.237260
2000-01-08 -1.574361 -0.982496 1.451461 1.313502
2000-01-09 -0.055895 2.102627 1.120571 0.622188
2000-01-10 0.240397 0.637192 0.429423 0.654151
A B C D
2000-01-01 -1.191295 -2.159337 1.657109 -0.782601
2000-01-02 -1.610538 -1.759872 2.038847 -0.995512
2000-01-03 -0.557699 -1.198758 0.333853 -0.491843
2000-01-04 1.127391 2.417882 -3.268604 0.778976
2000-01-05 1.603022 1.850199 -3.629186 2.895096
2000-01-06 -0.372223 1.247821 -2.314925 2.575458
2000-01-07 1.301618 -1.712223 -1.224197 2.324500
2000-01-08 -0.329132 -2.526501 0.206107 1.734793
2000-01-09 0.537380 -0.382610 1.717411 2.172949
2000-01-10 -1.389859 1.757323 3.001455 2.589841
7.7.2. 在数据框的单个列上应用聚合¶
示例代码¶
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2000', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>> print("====================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r['A'].aggregate(np.sum))
A B C D
2000-01-01 0.899614 1.904443 0.620118 -1.196649
2000-01-02 -0.816152 1.095829 0.526991 0.077817
2000-01-03 -0.728594 0.907431 1.122774 1.202263
2000-01-04 -0.339719 1.229053 0.833520 -1.121457
2000-01-05 -0.066104 -0.177040 -1.296439 0.868574
2000-01-06 -1.024873 -0.415325 -2.467596 0.027099
2000-01-07 0.388763 -0.720107 0.586934 0.599713
2000-01-08 1.114513 -0.775295 1.241403 -1.623362
2000-01-09 -0.864455 1.063592 1.433591 0.119835
2000-01-10 -0.304665 2.047228 0.151332 -1.078892
====================================
2000-01-01 0.899614
2000-01-02 0.083462
2000-01-03 -0.645132
2000-01-04 -1.884465
2000-01-05 -1.134418
2000-01-06 -1.430696
2000-01-07 -0.702214
2000-01-08 0.478404
2000-01-09 0.638822
2000-01-10 -0.054606
Freq: D, Name: A, dtype: float64
在DataFrame的多列上应用聚合
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2018', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>> print ("==========================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r[['A','B']].aggregate(np.sum))
A B C D
2018-01-01 2.768425 -0.497260 -0.430518 -0.390839
2018-01-02 -1.017491 -0.478303 -2.269947 2.662471
2018-01-03 2.135067 -0.363699 1.756624 0.529155
2018-01-04 1.865168 -1.485002 1.045523 0.323405
2018-01-05 1.021525 1.022261 -0.209354 -1.515144
2018-01-06 0.655466 0.436306 1.572498 -0.631222
2018-01-07 1.756663 -0.963968 -0.040532 0.142463
2018-01-08 -0.526707 1.167008 0.881830 -0.394477
2018-01-09 -2.439931 0.353109 0.653103 -1.738840
2018-01-10 0.554504 1.012518 1.751990 0.428687
==========================================
A B
2018-01-01 2.768425 -0.497260
2018-01-02 1.750934 -0.975563
2018-01-03 3.886001 -1.339262
2018-01-04 2.982743 -2.327003
2018-01-05 5.021760 -0.826440
2018-01-06 3.542159 -0.026435
2018-01-07 3.433654 0.494599
2018-01-08 1.885422 0.639346
2018-01-09 -1.209975 0.556149
2018-01-10 -2.412134 2.532636
在DataFrame的单个列上应用多个函数
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('2019/01/01', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>>
>>> print("==========================================")
>>>
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r['A'].aggregate([np.sum,np.mean]))
A B C D
2019-01-01 1.289568 -0.150813 0.819723 0.333205
2019-01-02 -0.777984 -1.418614 0.166166 -0.173467
2019-01-03 -1.478802 -0.522707 1.214267 -0.376789
2019-01-04 0.213583 -0.434050 -0.669066 0.875161
2019-01-05 -0.505610 0.403382 0.449065 1.161292
2019-01-06 1.255886 0.138499 -0.669481 0.009081
2019-01-07 1.504702 0.710545 0.102133 0.123100
2019-01-08 0.616911 -0.559758 0.399398 0.690950
2019-01-09 0.781351 0.925963 -0.634495 0.734642
2019-01-10 0.066432 0.540052 1.074223 0.479327
==========================================
sum mean
2019-01-01 1.289568 1.289568
2019-01-02 0.511584 0.255792
2019-01-03 -0.967218 -0.322406
2019-01-04 -2.043203 -0.681068
2019-01-05 -1.770829 -0.590276
2019-01-06 0.963859 0.321286
2019-01-07 2.254979 0.751660
2019-01-08 3.377500 1.125833
2019-01-09 2.902964 0.967655
2019-01-10 1.464693 0.488231
7.7.3. 在DataFrame的多列上应用多个函数¶
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('2020/01/01', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>>
>>> print (df)
>>> print("==========================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r[['A','B']].aggregate([np.sum,np.mean]))
A B C D
2020-01-01 -0.738247 -1.631835 1.505123 1.898867
2020-01-02 -1.382268 1.860687 -0.500002 0.652996
2020-01-03 -0.001419 1.867492 0.789780 -1.060373
2020-01-04 -0.926137 -1.108752 -1.351586 -1.110365
2020-01-05 -0.834301 -1.775205 0.517520 0.620481
2020-01-06 -0.080035 -0.110765 0.078427 -1.494267
2020-01-07 -1.323718 -1.042942 0.913473 0.627442
2020-01-08 -0.830897 -0.415440 -0.119706 -1.384088
2020-01-09 0.701403 0.205893 0.331462 -0.102985
2020-01-10 0.802444 0.592147 0.417783 0.742442
==========================================
A B
sum mean sum mean
2020-01-01 -0.738247 -0.738247 -1.631835 -1.631835
2020-01-02 -2.120515 -1.060257 0.228851 0.114426
2020-01-03 -2.121934 -0.707311 2.096343 0.698781
2020-01-04 -2.309824 -0.769941 2.619426 0.873142
2020-01-05 -1.761857 -0.587286 -1.016466 -0.338822
2020-01-06 -1.840473 -0.613491 -2.994722 -0.998241
2020-01-07 -2.238054 -0.746018 -2.928912 -0.976304
2020-01-08 -2.234650 -0.744883 -1.569147 -0.523049
2020-01-09 -1.453212 -0.484404 -1.252490 -0.417497
2020-01-10 0.672950 0.224317 0.382600 0.127533
将不同的函数应用于DataFrame的不同列
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(3, 4),
>>> index = pd.date_range('2020/01/01', periods=3),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>> print("==========================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r.aggregate({'A' : np.sum,'B' : np.mean}))
A B C D
2020-01-01 -0.760202 -0.315296 -1.161479 0.742776
2020-01-02 0.536496 -2.534675 0.472423 -1.866547
2020-01-03 1.186027 -0.767979 -1.401393 -0.319669
==========================================
A B
2020-01-01 -0.760202 -0.315296
2020-01-02 -0.223706 -1.424986
2020-01-03 0.962321 -1.205983