>>> from env_helper import info; info()
待更新
6.7. Pandas聚合¶
当有了滚动,扩展和ewm对象创建了以后,就有几种方法可以对数据执行聚合。
DataFrame应用聚合¶
让我们创建一个DataFrame并在其上应用聚合。
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2019', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>>
>>> print (df)
>>> print("=======================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r)
A B C D
2019-01-01 -0.297161 0.083323 -0.974163 -2.315540
2019-01-02 -1.109668 -0.187732 1.702003 -0.634991
2019-01-03 0.028106 4.080762 -0.353134 0.529624
2019-01-04 -0.766226 0.140721 -1.683656 -0.177402
2019-01-05 0.865192 0.400481 -1.123899 1.497569
2019-01-06 1.202430 0.149648 -0.443906 -0.849716
2019-01-07 -0.222641 1.438695 0.559307 0.180533
2019-01-08 -0.007848 1.071640 -1.223527 -0.313666
2019-01-09 -0.202040 -0.167778 -0.065897 1.358541
2019-01-10 -0.328471 0.260195 -0.757174 0.544447
=======================================
Rolling [window=3,min_periods=1,center=False,axis=0]
可以通过向整个DataFrame传递一个函数来进行聚合,或者通过标准的获取项目方法来选择一个列。 在整个数据框上应用聚合
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2000', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print(df)
>>>
>>> r = df.rolling(window=3,min_periods=1)
>>> print(r.aggregate(np.sum))
A B C D
2000-01-01 -1.057510 -1.319452 1.438169 0.354097
2000-01-02 -1.665754 0.732910 0.895425 -0.170069
2000-01-03 -0.425298 -1.728492 0.614785 -1.327529
2000-01-04 1.641398 1.054154 -0.307831 1.634812
2000-01-05 0.692571 -1.762662 0.959677 -0.404613
2000-01-06 0.601088 0.391329 -0.338605 0.366265
2000-01-07 -2.333054 -1.024928 -2.117509 -2.236876
2000-01-08 0.441563 0.084132 -0.210743 -0.425921
2000-01-09 0.291801 -0.401348 -0.854690 -0.492878
2000-01-10 -1.457577 0.634589 0.853480 0.617184
A B C D
2000-01-01 -1.057510 -1.319452 1.438169 0.354097
2000-01-02 -2.723264 -0.586543 2.333593 0.184028
2000-01-03 -3.148562 -2.315034 2.948378 -1.143501
2000-01-04 -0.449654 0.058572 1.202379 0.137214
2000-01-05 1.908671 -2.436999 1.266631 -0.097330
2000-01-06 2.935057 -0.317179 0.313241 1.596464
2000-01-07 -1.039395 -2.396261 -1.496437 -2.275224
2000-01-08 -1.290403 -0.549467 -2.666857 -2.296533
2000-01-09 -1.599690 -1.342143 -3.182942 -3.155675
2000-01-10 -0.724213 0.317374 -0.211953 -0.301615
在数据框的单个列上应用聚合¶
示例代码¶
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2000', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>> print("====================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r['A'].aggregate(np.sum))
A B C D
2000-01-01 -0.143118 -1.240426 0.436630 -0.444383
2000-01-02 0.628114 -1.783910 -0.175060 -0.343566
2000-01-03 -1.109600 1.086021 1.626791 0.800103
2000-01-04 0.793569 -0.472492 1.034773 -0.601865
2000-01-05 0.191128 -2.123454 -0.177607 -0.947208
2000-01-06 -0.218268 1.726078 0.074371 -0.828348
2000-01-07 -1.723376 1.690724 -0.086129 1.057527
2000-01-08 -0.240308 0.695771 -0.076730 1.058622
2000-01-09 0.519189 -0.543348 -0.912696 -1.193401
2000-01-10 0.381399 1.135461 1.238793 1.627239
====================================
2000-01-01 -0.143118
2000-01-02 0.484995
2000-01-03 -0.624605
2000-01-04 0.312082
2000-01-05 -0.124903
2000-01-06 0.766429
2000-01-07 -1.750516
2000-01-08 -2.181952
2000-01-09 -1.444495
2000-01-10 0.660280
Freq: D, Name: A, dtype: float64
在DataFrame的多列上应用聚合
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('1/1/2018', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>> print ("==========================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r[['A','B']].aggregate(np.sum))
A B C D
2018-01-01 1.161324 0.882373 0.525064 -0.004068
2018-01-02 0.565859 0.457815 0.879255 -0.750593
2018-01-03 -1.139034 -0.927946 0.186392 0.015014
2018-01-04 -1.227686 0.395136 -0.064800 -0.157918
2018-01-05 1.086757 -0.214619 0.134422 0.504477
2018-01-06 -0.845733 0.557141 0.312587 -1.226506
2018-01-07 1.228012 -0.210966 -1.160787 -0.724112
2018-01-08 0.213551 1.456926 1.207446 -1.483803
2018-01-09 -0.398389 0.798960 -0.194087 0.175509
2018-01-10 0.168422 -0.723523 -1.462455 -0.605016
==========================================
A B
2018-01-01 1.161324 0.882373
2018-01-02 1.727184 1.340188
2018-01-03 0.588150 0.412242
2018-01-04 -1.800861 -0.074996
2018-01-05 -1.279963 -0.747429
2018-01-06 -0.986662 0.737658
2018-01-07 1.469036 0.131556
2018-01-08 0.595829 1.803101
2018-01-09 1.043173 2.044921
2018-01-10 -0.016416 1.532363
在DataFrame的单个列上应用多个函数
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('2019/01/01', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>>
>>> print("==========================================")
>>>
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r['A'].aggregate([np.sum,np.mean]))
A B C D
2019-01-01 -1.417711 0.509442 -1.495424 1.027103
2019-01-02 0.962894 0.455878 0.785055 0.163292
2019-01-03 1.275789 1.077626 0.092726 -0.273991
2019-01-04 -0.574358 1.312393 0.138201 0.017391
2019-01-05 0.694693 0.406216 -0.589011 -1.944672
2019-01-06 0.939143 -0.987033 -2.131934 0.867440
2019-01-07 -1.286568 -1.014230 0.419372 -0.247521
2019-01-08 -0.252256 -0.431310 -1.350533 0.463370
2019-01-09 0.051010 0.063909 0.029560 -0.139782
2019-01-10 1.941964 -1.049217 1.630443 -1.600684
==========================================
sum mean
2019-01-01 -1.417711 -1.417711
2019-01-02 -0.454818 -0.227409
2019-01-03 0.820971 0.273657
2019-01-04 1.664324 0.554775
2019-01-05 1.396123 0.465374
2019-01-06 1.059477 0.353159
2019-01-07 0.347268 0.115756
2019-01-08 -0.599681 -0.199894
2019-01-09 -1.487814 -0.495938
2019-01-10 1.740718 0.580239
在DataFrame的多列上应用多个函数¶
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 4),
>>> index = pd.date_range('2020/01/01', periods=10),
>>> columns = ['A', 'B', 'C', 'D'])
>>>
>>> print (df)
>>> print("==========================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r[['A','B']].aggregate([np.sum,np.mean]))
A B C D
2020-01-01 -0.690567 0.793077 1.584981 -0.359210
2020-01-02 1.213710 -1.628300 -1.503879 1.389749
2020-01-03 -1.522211 0.376837 2.072661 -2.662566
2020-01-04 -0.974572 -0.190313 0.746054 -0.062266
2020-01-05 0.893743 0.771649 0.100636 0.857923
2020-01-06 -1.924732 -0.263355 -0.358141 -0.361869
2020-01-07 -1.387412 0.106727 1.869404 -1.408780
2020-01-08 -1.762411 0.945474 -1.610881 -0.181529
2020-01-09 -0.640737 1.468332 -0.735777 1.154540
2020-01-10 0.353086 -0.107727 -0.514243 0.787290
==========================================
A B
sum mean sum mean
2020-01-01 -0.690567 -0.690567 0.793077 0.793077
2020-01-02 0.523144 0.261572 -0.835223 -0.417611
2020-01-03 -0.999068 -0.333023 -0.458386 -0.152795
2020-01-04 -1.283074 -0.427691 -1.441775 -0.480592
2020-01-05 -1.603041 -0.534347 0.958174 0.319391
2020-01-06 -2.005562 -0.668521 0.317981 0.105994
2020-01-07 -2.418401 -0.806134 0.615021 0.205007
2020-01-08 -5.074555 -1.691518 0.788846 0.262949
2020-01-09 -3.790560 -1.263520 2.520533 0.840178
2020-01-10 -2.050062 -0.683354 2.306079 0.768693
将不同的函数应用于DataFrame的不同列
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(3, 4),
>>> index = pd.date_range('2020/01/01', periods=3),
>>> columns = ['A', 'B', 'C', 'D'])
>>> print (df)
>>> print("==========================================")
>>> r = df.rolling(window=3,min_periods=1)
>>> print (r.aggregate({'A' : np.sum,'B' : np.mean}))
A B C D
2020-01-01 -1.015361 -2.742839 -0.223676 -1.516577
2020-01-02 -0.769286 1.455454 -1.774750 0.454370
2020-01-03 -0.139466 0.166692 0.272592 -0.273266
==========================================
A B
2020-01-01 -1.015361 -2.742839
2020-01-02 -1.784648 -0.643692
2020-01-03 -1.924113 -0.373564