0.18.1版(2016年5月3日)#

这是从0.18.0开始的一个较小的错误修复版本，包括大量错误修复以及几个新功能、增强功能和性能改进。我们建议所有用户升级到此版本。

亮点包括：

.groupby(...) has been enhanced to provide convenient syntax when working with .rolling(..), .expanding(..) and .resample(..) per group, see here
pd.to_datetime() has gained the ability to assemble dates from a DataFrame, see here
方法链接的改进，请参见 here 。
自定义营业时间偏移，请参阅 here 。
在处理过程中修复了许多错误 sparse ，请参见 here
扩展了 Tutorials section 以现代Pandas为专题，由 @TomAugsburger 。 (GH13045 )。

V0.18.1中的新特性

新功能#

自定义营业时间#

The CustomBusinessHour is a mixture of BusinessHour and CustomBusinessDay which allows you to specify arbitrary holidays. For details, see Custom Business Hour (GH11514)

In [1]: from pandas.tseries.offsets import CustomBusinessHour

In [2]: from pandas.tseries.holiday import USFederalHolidayCalendar

In [3]: bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())

MLK日之前的星期五

In [4]: import datetime

In [5]: dt = datetime.datetime(2014, 1, 17, 15)

In [6]: dt + bhour_us
Out[6]: Timestamp('2014-01-17 16:00:00')

MLK日后的星期二(星期一因放假而跳过)

In [7]: dt + bhour_us * 2
Out[7]: Timestamp('2014-01-20 09:00:00')

方法 `.groupby(..)` 带有窗口和重采样操作的语法#

.groupby(...) 已得到增强，可以在使用时提供方便的语法 .rolling(..) ， .expanding(..) 和 .resample(..) 每组，请参阅 (GH12486 ， GH12738 )。

您现在可以使用 .rolling(..) 和 .expanding(..) 就像团购中的方法一样。这些函数返回另一个延迟对象(类似于 .rolling() 和 .expanding() 对未分组的Pandas对象执行操作)。然后，您可以对这些进行操作 RollingGroupby 对象，以类似的方式。

以前，您必须执行此操作才能获得每个组的滚动窗口平均值：

In [8]: df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})

In [9]: df
Out[9]: 
    A   B
0   1   0
1   1   1
2   1   2
3   1   3
4   1   4
.. ..  ..
35  3  35
36  3  36
37  3  37
38  3  38
39  3  39

[40 rows x 2 columns]

In [10]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
Out[10]: 
A    
1  0      NaN
   1      NaN
   2      NaN
   3      1.5
   4      2.5
         ... 
3  35    33.5
   36    34.5
   37    35.5
   38    36.5
   39    37.5
Name: B, Length: 40, dtype: float64

现在，您可以执行以下操作：

In [11]: df.groupby("A").rolling(4).B.mean()
Out[11]: 
A    
1  0      NaN
   1      NaN
   2      NaN
   3      1.5
   4      2.5
         ... 
3  35    33.5
   36    34.5
   37    35.5
   38    36.5
   39    37.5
Name: B, Length: 40, dtype: float64

为 .resample(..) 操作类型，以前您必须：

In [12]: df = pd.DataFrame(
   ....:     {
   ....:         "date": pd.date_range(start="2016-01-01", periods=4, freq="W"),
   ....:         "group": [1, 1, 2, 2],
   ....:         "val": [5, 6, 7, 8],
   ....:     }
   ....: ).set_index("date")
   ....: 

In [13]: df
Out[13]: 
            group  val
date                  
2016-01-03      1    5
2016-01-10      1    6
2016-01-17      2    7
2016-01-24      2    8

[4 rows x 2 columns]

In [14]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
Out[14]: 
                  group  val
group date                  
1     2016-01-03      1    5
      2016-01-04      1    5
      2016-01-05      1    5
      2016-01-06      1    5
      2016-01-07      1    5
...                 ...  ...
2     2016-01-20      2    7
      2016-01-21      2    7
      2016-01-22      2    7
      2016-01-23      2    7
      2016-01-24      2    8

[16 rows x 2 columns]

现在，您可以执行以下操作：

In [15]: df.groupby("group").resample("1D").ffill()
Out[15]: 
                  group  val
group date                  
1     2016-01-03      1    5
      2016-01-04      1    5
      2016-01-05      1    5
      2016-01-06      1    5
      2016-01-07      1    5
...                 ...  ...
2     2016-01-20      2    7
      2016-01-21      2    7
      2016-01-22      2    7
      2016-01-23      2    7
      2016-01-24      2    8

[16 rows x 2 columns]

方法链接的改进#

以下方法/索引器现在接受 callable 。它的目的是使它们在方法链中更有用，请参见 documentation 。 (GH11485 ， GH12533 )

.where() and .mask()
.loc[], iloc[] and .ix[]
[] 标引

方法： `.where()` 和 `.mask()`#

它们可以接受条件的可调用函数，并且 other 争论。

In [16]: df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]})

In [17]: df.where(lambda x: x > 4, lambda x: x + 10)
Out[17]: 
    A   B  C
0  11  14  7
1  12   5  8
2  13   6  9

[3 rows x 3 columns]

方法： `.loc[]` ， `.iloc[]` ， `.ix[]`#

它们可以接受一个可调用的和一个可调用的元组作为切片器。Callable可以返回有效的布尔索引器或对这些索引器的输入有效的任何内容。

# callable returns bool indexer
In [18]: df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]
Out[18]: 
   B  C
1  5  8
2  6  9

[2 rows x 2 columns]

# callable returns list of labels
In [19]: df.loc[lambda x: [1, 2], lambda x: ["A", "B"]]
Out[19]: 
   A  B
1  2  5
2  3  6

[2 rows x 2 columns]

索引使用 `[]`#

最后，您可以在 [] 系列、数据帧和面板的索引。Callable必须返回有效的输入 [] 根据其类别和索引类型进行索引。

In [20]: df[lambda x: "A"]
Out[20]: 
0    1
1    2
2    3
Name: A, Length: 3, dtype: int64

使用这些方法/索引器，可以在不使用临时变量的情况下链接数据选择操作。

In [21]: bb = pd.read_csv("data/baseball.csv", index_col="id")

In [22]: (bb.groupby(["year", "team"]).sum().loc[lambda df: df.r > 100])
Out[22]: 
           stint    g    ab    r    h  X2b  X3b  hr    rbi    sb   cs   bb     so   ibb   hbp    sh    sf  gidp
year team                                                                                                      
2007 CIN       6  379   745  101  203   35    2  36  125.0  10.0  1.0  105  127.0  14.0   1.0   1.0  15.0  18.0
     DET       5  301  1062  162  283   54    4  37  144.0  24.0  7.0   97  176.0   3.0  10.0   4.0   8.0  28.0
     HOU       4  311   926  109  218   47    6  14   77.0  10.0  4.0   60  212.0   3.0   9.0  16.0   6.0  17.0
     LAN      11  413  1021  153  293   61    3  36  154.0   7.0  5.0  114  141.0   8.0   9.0   3.0   8.0  29.0
     NYN      13  622  1854  240  509  101    3  61  243.0  22.0  4.0  174  310.0  24.0  23.0  18.0  15.0  48.0
     SFN       5  482  1305  198  337   67    6  40  171.0  26.0  7.0  235  188.0  51.0   8.0  16.0   6.0  41.0
     TEX       2  198   729  115  200   40    4  28  115.0  21.0  4.0   73  140.0   4.0   5.0   2.0   8.0  16.0
     TOR       4  459  1408  187  378   96    2  58  223.0   4.0  2.0  190  265.0  16.0  12.0   4.0  16.0  38.0

[8 rows x 18 columns]

启用部分字符串索引 `DatetimeIndex` 当一个 `MultiIndex`#

Partial string indexing now matches on DateTimeIndex when part of a MultiIndex (GH10331)

In [23]: dft2 = pd.DataFrame(
   ....:     np.random.randn(20, 1),
   ....:     columns=["A"],
   ....:     index=pd.MultiIndex.from_product(
   ....:         [pd.date_range("20130101", periods=10, freq="12H"), ["a", "b"]]
   ....:     ),
   ....: )
   ....: 

In [24]: dft2
Out[24]: 
                              A
2013-01-01 00:00:00 a  0.469112
                    b -0.282863
2013-01-01 12:00:00 a -1.509059
                    b -1.135632
2013-01-02 00:00:00 a  1.212112
...                         ...
2013-01-04 12:00:00 b  0.271860
2013-01-05 00:00:00 a -0.424972
                    b  0.567020
2013-01-05 12:00:00 a  0.276232
                    b -1.087401

[20 rows x 1 columns]

In [25]: dft2.loc["2013-01-05"]
Out[25]: 
                              A
2013-01-05 00:00:00 a -0.424972
                    b  0.567020
2013-01-05 12:00:00 a  0.276232
                    b -1.087401

[4 rows x 1 columns]

在其他层面上

In [26]: idx = pd.IndexSlice

In [27]: dft2 = dft2.swaplevel(0, 1).sort_index()

In [28]: dft2
Out[28]: 
                              A
a 2013-01-01 00:00:00  0.469112
  2013-01-01 12:00:00 -1.509059
  2013-01-02 00:00:00  1.212112
  2013-01-02 12:00:00  0.119209
  2013-01-03 00:00:00 -0.861849
...                         ...
b 2013-01-03 12:00:00  1.071804
  2013-01-04 00:00:00 -0.706771
  2013-01-04 12:00:00  0.271860
  2013-01-05 00:00:00  0.567020
  2013-01-05 12:00:00 -1.087401

[20 rows x 1 columns]

In [29]: dft2.loc[idx[:, "2013-01-05"], :]
Out[29]: 
                              A
a 2013-01-05 00:00:00 -0.424972
  2013-01-05 12:00:00  0.276232
b 2013-01-05 00:00:00  0.567020
  2013-01-05 12:00:00 -1.087401

[4 rows x 1 columns]

汇编日期时间#

pd.to_datetime() 获取了从传入的 DataFrame 或者是一条判决。 (GH8158 )。

In [30]: df = pd.DataFrame(
   ....:     {"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]}
   ....: )
   ....: 

In [31]: df
Out[31]: 
   year  month  day  hour
0  2015      2    4     2
1  2016      3    5     3

[2 rows x 4 columns]

使用传递的框架进行组装。

In [32]: pd.to_datetime(df)
Out[32]: 
0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
Length: 2, dtype: datetime64[ns]

您只能传递需要组装的柱。

In [33]: pd.to_datetime(df[["year", "month", "day"]])
Out[33]: 
0   2015-02-04
1   2016-03-05
Length: 2, dtype: datetime64[ns]

其他增强功能#

pd.read_csv() 现在支持 delim_whitespace=True 对于Python引擎 (GH12958 )
pd.read_csv() now supports opening ZIP files that contains a single CSV, via extension inference or explicit compression='zip' (GH12175)
pd.read_csv() 现在支持通过扩展推断或显式使用XZ压缩打开文件 compression='xz' 是指定的； xz 还支持压缩 DataFrame.to_csv 以同样的方式 (GH11852 )
pd.read_msgpack() 即使在使用压缩时，现在也始终提供可写ndarray (GH12359 )。
pd.read_msgpack() 现在支持使用msgpack序列化和反序列化分类 (GH12573 )
.to_json() 现在支持 NDFrames 包含分类数据和稀疏数据 (GH10778 )
interpolate() now supports method='akima' (GH7588).
pd.read_excel() 现在接受路径对象(例如 pathlib.Path ， py.path.local )作为文件路径，与其他 read_* 功能 (GH12655 )
已添加 .weekday_name 属性作为组件添加到 DatetimeIndex 以及 .dt 访问者。 (GH11128 )

Index.take 现在的句柄 allow_fill 和 fill_value 始终如一 (GH12631 )

In [34]: idx = pd.Index([1.0, 2.0, 3.0, 4.0], dtype="float")

# default, allow_fill=True, fill_value=None
In [35]: idx.take([2, -1])
Out[35]: Float64Index([3.0, 4.0], dtype='float64')

In [36]: idx.take([2, -1], fill_value=True)
Out[36]: Float64Index([3.0, nan], dtype='float64')

Index now supports .str.get_dummies() which returns MultiIndex, see Creating Indicator Variables (GH10008, GH10103)

In [37]: idx = pd.Index(["a|b", "a|c", "b|c"])

In [38]: idx.str.get_dummies("|")
Out[38]: 
MultiIndex([(1, 1, 0),
            (1, 0, 1),
            (0, 1, 1)],
           names=['a', 'b', 'c'])

pd.crosstab() 已经获得了 normalize 归一化频率表的参数 (GH12569 )。更新的文档中的示例 here 。
.resample(..).interpolate() 现在支持 (GH12925 )
.isin() now accepts passed sets (GH12988)

稀疏变化#

这些更改符合稀疏处理，以返回正确的类型，并提供更流畅的索引体验。

SparseArray.take now returns a scalar for scalar input, SparseArray for others. Furthermore, it handles a negative indexer with the same rule as Index (GH10560, GH12796)

s = pd.SparseArray([np.nan, np.nan, 1, 2, 3, np.nan, 4, 5, np.nan, 6])
s.take(0)
s.take([1, 2, 3])

Bug in SparseSeries[] indexing with Ellipsis raises KeyError (GH9467)
窃听 SparseArray[] 未正确处理使用元组进行索引 (GH12966 )
Bug in SparseSeries.loc[] with list-like input raises TypeError (GH10560)
Bug in SparseSeries.iloc[] with scalar input may raise IndexError (GH10560)
Bug in SparseSeries.loc[], .iloc[] with slice returns SparseArray, rather than SparseSeries (GH10560)
Bug in SparseDataFrame.loc[], .iloc[] may results in dense Series, rather than SparseSeries (GH12787)
窃听 SparseArray 加法会忽略 fill_value 右手边 (GH12910 )
Bug in SparseArray mod raises AttributeError (GH12910)
窃听 SparseArray 战俘计算 1 ** np.nan 作为 np.nan 必须是1 (GH12910 )
Bug in SparseArray comparison output may incorrect result or raise ValueError (GH12971)
Bug in SparseSeries.__repr__ raises TypeError when it is longer than max_rows (GH10560)
Bug in SparseSeries.shape ignores fill_value (GH10452)
窃听 SparseSeries 和 SparseArray 可能会有不同 dtype 从其密集的价值 (GH12908 )
Bug in SparseSeries.reindex incorrectly handle fill_value (GH12797)
Bug in SparseArray.to_frame() results in DataFrame, rather than SparseDataFrame (GH9850)
Bug in SparseSeries.value_counts() does not count fill_value (GH6749)
Bug in SparseArray.to_dense() does not preserve dtype (GH10648)
Bug in SparseArray.to_dense() incorrectly handle fill_value (GH12797)
窃听 pd.concat() 的 SparseSeries 结果导致密集 (GH10536 )
Bug in pd.concat() of SparseDataFrame incorrectly handle fill_value (GH9765)
Bug in pd.concat() of SparseDataFrame may raise AttributeError (GH12174)
Bug in SparseArray.shift() may raise NameError or TypeError (GH12908)

API更改#

方法 `.groupby(..).nth()` 变化#

中的索引 .groupby(..).nth() 输出现在更加一致，当 as_index 参数被传递 (GH11039 )：

In [39]: df = pd.DataFrame({"A": ["a", "b", "a"], "B": [1, 2, 3]})

In [40]: df
Out[40]: 
   A  B
0  a  1
1  b  2
2  a  3

[3 rows x 2 columns]

以前的行为：

In [3]: df.groupby('A', as_index=True)['B'].nth(0)
Out[3]:
0    1
1    2
Name: B, dtype: int64

In [4]: df.groupby('A', as_index=False)['B'].nth(0)
Out[4]:
0    1
1    2
Name: B, dtype: int64

新行为：

In [41]: df.groupby("A", as_index=True)["B"].nth(0)
Out[41]: 
A
a    1
b    2
Name: B, Length: 2, dtype: int64

In [42]: df.groupby("A", as_index=False)["B"].nth(0)
Out[42]: 
0    1
1    2
Name: B, Length: 2, dtype: int64

此外，此前，一个 .groupby 总是会排序，不管 sort=False 已获通过 .nth() 。

In [43]: np.random.seed(1234)

In [44]: df = pd.DataFrame(np.random.randn(100, 2), columns=["a", "b"])

In [45]: df["c"] = np.random.randint(0, 4, 100)

以前的行为：

In [4]: df.groupby('c', sort=True).nth(1)
Out[4]:
          a         b
c
0 -0.334077  0.002118
1  0.036142 -2.074978
2 -0.720589  0.887163
3  0.859588 -0.636524

In [5]: df.groupby('c', sort=False).nth(1)
Out[5]:
          a         b
c
0 -0.334077  0.002118
1  0.036142 -2.074978
2 -0.720589  0.887163
3  0.859588 -0.636524

新行为：

In [46]: df.groupby("c", sort=True).nth(1)
Out[46]: 
          a         b
c                    
0 -0.334077  0.002118
1  0.036142 -2.074978
2 -0.720589  0.887163
3  0.859588 -0.636524

[4 rows x 2 columns]

In [47]: df.groupby("c", sort=False).nth(1)
Out[47]: 
          a         b
c                    
2 -0.720589  0.887163
3  0.859588 -0.636524
0 -0.334077  0.002118
1  0.036142 -2.074978

[4 rows x 2 columns]

NumPy函数兼容性#

Pandas类阵列方法之间的兼容性(例如 sum 和 take )和他们的 numpy 通过增加签名，大大增加了对应的 pandas 方法，以便接受可以从 numpy ，即使它们不一定在 pandas 实施 (GH12644 ， GH12638 ， GH12687 )

.searchsorted() 为 Index 和 TimedeltaIndex 现在接受一个 sorter 参数以保持与NumPy的兼容性 searchsorted 功能 (GH12238 )
Bug in numpy compatibility of np.round() on a Series (GH12600)

此签名增强的示例如下所示：

sp = pd.SparseDataFrame([1, 2, 3])
sp

既往行为：

In [2]: np.cumsum(sp, axis=0)
...
TypeError: cumsum() takes at most 2 arguments (4 given)

新行为：

np.cumsum(sp, axis=0)

使用 `.apply` 关于按重采样分组#

使用 apply 关于GROUP BY操作的重采样(使用 pd.TimeGrouper )现在具有与相似的相同输出类型 apply 对其他分组操作的调用。 (GH11742 )。

In [48]: df = pd.DataFrame(
   ....:     {"date": pd.to_datetime(["10/10/2000", "11/10/2000"]), "value": [10, 13]}
   ....: )
   ....: 

In [49]: df
Out[49]: 
        date  value
0 2000-10-10     10
1 2000-11-10     13

[2 rows x 2 columns]

以前的行为：

In [1]: df.groupby(pd.TimeGrouper(key='date',
   ...:                           freq='M')).apply(lambda x: x.value.sum())
Out[1]:
...
TypeError: cannot concatenate a non-NDFrame object

# Output is a Series
In [2]: df.groupby(pd.TimeGrouper(key='date',
   ...:                           freq='M')).apply(lambda x: x[['value']].sum())
Out[2]:
date
2000-10-31  value    10
2000-11-30  value    13
dtype: int64

新行为：

# Output is a Series
In [55]: df.groupby(pd.TimeGrouper(key='date',
    ...:                           freq='M')).apply(lambda x: x.value.sum())
Out[55]:
date
2000-10-31    10
2000-11-30    13
Freq: M, dtype: int64

# Output is a DataFrame
In [56]: df.groupby(pd.TimeGrouper(key='date',
    ...:                           freq='M')).apply(lambda x: x[['value']].sum())
Out[56]:
            value
date
2000-10-31     10
2000-11-30     13

中的更改 `read_csv` 例外#

为了使之标准化， read_csv 这两个应用程序的 c 和 python 引擎，现在都将提高 EmptyDataError ，是的子类 ValueError ，以响应空列或表头 (GH12493 ， GH12506 )

既往行为：

In [1]: import io

In [2]: df = pd.read_csv(io.StringIO(''), engine='c')
...
ValueError: No columns to parse from file

In [3]: df = pd.read_csv(io.StringIO(''), engine='python')
...
StopIteration

新行为：

In [1]: df = pd.read_csv(io.StringIO(''), engine='c')
...
pandas.io.common.EmptyDataError: No columns to parse from file

In [2]: df = pd.read_csv(io.StringIO(''), engine='python')
...
pandas.io.common.EmptyDataError: No columns to parse from file

除了这一错误更改外，还进行了其他几项更改：

CParserError now sub-classes ValueError instead of just a Exception (GH12551)
A CParserError 现在已引发，而不是泛型 Exception 在……里面 read_csv 当 c 引擎无法分析列 (GH12506 )
A ValueError 现在已引发，而不是泛型 Exception 在……里面 read_csv 当 c 引擎遇到一个 NaN 整型列中的值 (GH12506 )
A ValueError 现在已引发，而不是泛型 Exception 在……里面 read_csv 什么时候 true_values 是指定的，并且 c 引擎遇到包含不可编码字节的列中的元素 (GH12506 )
pandas.parser.OverflowError 异常已被移除，并已替换为Python的内置 OverflowError 例外情况 (GH12506 )
pd.read_csv() 不再允许将字符串和整数组合用于 usecols 参数 (GH12678 )

方法 `to_datetime` 错误更改#

虫子进来了 pd.to_datetime() 当传递一个 unit 具有可兑换条目和 errors='coerce' 或不可兑换的 errors='ignore' 。此外，一个 OutOfBoundsDateime 当遇到该单位超出范围的值时，将引发异常 errors='raise' 。 (GH11758 ， GH13052 ， GH13059 )

既往行为：

In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
Out[27]: NaT

In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
OverflowError: Python int too large to convert to C long

In [29]: pd.to_datetime(11111111, unit='D', errors='raise')
OverflowError: Python int too large to convert to C long

新行为：

In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce')
Out[2]: Timestamp('2014-12-31 16:31:00')

In [3]: pd.to_datetime(11111111, unit='D', errors='ignore')
Out[3]: 11111111

In [4]: pd.to_datetime(11111111, unit='D', errors='raise')
OutOfBoundsDatetime: cannot convert input with unit 'D'

其他API更改#

.swaplevel() 为 Series ， DataFrame ， Panel ，以及 MultiIndex 现在为其前两个参数提供了默认设置 i 和 j 这就调换了该指数最内部的两个水平。 (GH12934 )
.searchsorted() 为 Index 和 TimedeltaIndex 现在接受一个 sorter 参数以保持与NumPy的兼容性 searchsorted 功能 (GH12238 )
Period and PeriodIndex now raises IncompatibleFrequency error which inherits ValueError rather than raw ValueError (GH12615)
Series.apply 对于类别dtype，现在将传递的函数应用于每个 .categories (而不是 .codes )，并返回一个 category 如果可能，请输入dtype (GH12473 )
read_csv 现在将引发一个 TypeError 如果 parse_dates 既不是布尔值、列表也不是字典(与文档字符串匹配) (GH5636 )
的默认设置 .query()/.eval() 现在是 engine=None ，它将使用 numexpr 如果已安装，则它将回退到 python 引擎。这模仿了0.18.1之前的行为，如果 numexpr 已安装(并且以前，如果未安装NumExpr， .query()/.eval() 将提高)。 (GH12749 )
pd.show_versions() 现在包括 pandas_datareader 版本 (GH12740 )
提供适当的 __name__ 和 __qualname__ 泛型函数的属性 (GH12021 )
pd.concat(ignore_index=True) 现在使用 RangeIndex 作为默认设置 (GH12695 )
pd.merge() 和 DataFrame.join() 将显示一个 UserWarning 合并/加入具有多层数据帧的单数据帧时 (GH9455 ， GH12219 )
与之比较 scipy >0.17表示不推荐使用 piecewise_polynomial 插值法；支持替换 from_derivatives 方法 (GH12887 )

不推荐使用#

The method name Index.sym_diff() is deprecated and can be replaced by Index.symmetric_difference() (GH12591)
The method name Categorical.sort() is deprecated in favor of Categorical.sort_values() (GH12882)

性能改进#

提高了SAS阅读器的速度 (GH12656 ， GH12961 )
Performance improvements in .groupby(..).cumcount() (GH11039)
Improved memory usage in pd.read_csv() when using skiprows=an_integer (GH13005)
改进的性能 DataFrame.to_sql 检查表的区分大小写时。现在仅在表名不是小写时检查是否已正确创建表。 (GH12876 )
改进的性能 Period 构造图和时间序列图 (GH12903 ， GH11831 )。
改进的性能 .str.encode() 和 .str.decode() 方法： (GH13008 )
改进的性能 to_numeric 如果输入是数字数据类型 (GH12777 )
Improved performance of sparse arithmetic with IntIndex (GH13036)

错误修复#

usecols 中的参数 pd.read_csv 现在，即使CSV文件的行数不均匀，也会受到尊重 (GH12203 )
窃听 groupby.transform(..) 什么时候 axis=1 使用非单调有序索引指定 (GH12713 )
窃听 Period 和 PeriodIndex 创造提升 KeyError 如果 freq="Minute" 是指定的。请注意，在v0.17.0中不推荐使用分钟频率，建议使用 freq="T" 取而代之的是 (GH11854 )
Bug in .resample(...).count() with a PeriodIndex always raising a TypeError (GH12774)
窃听 .resample(...) 使用一个 PeriodIndex 强制转换为 DatetimeIndex 当为空时 (GH12868 )
窃听 .resample(...) 使用一个 PeriodIndex 重采样到现有频率时 (GH12770 )
Bug in printing data which contains Period with different freq raises ValueError (GH12615)
窃听 Series 使用以下工具进行施工 Categorical 和 dtype='category' 是指定的 (GH12574 )
Bugs in concatenation with a coercible dtype was too aggressive, resulting in different dtypes in output formatting when an object was longer than display.max_rows (GH12411, GH12045, GH11594, GH10571, GH12211)
窃听 float_format 选项未被验证为可调用的选项的选项。 (GH12706 )
窃听 GroupBy.filter 什么时候 dropna=False 而且没有一个组符合标准 (GH12768 )
窃听 __name__ 的 .cum* 功能 (GH12021 )
Bug in .astype() of a Float64Inde/Int64Index to an Int64Index (GH12881)
中的基于整数的索引的往返错误 .to_json()/.read_json() 什么时候 orient='index' (默认设置) (GH12866 )
绘图中的错误 Categorical 尝试堆叠条形图时，数据类型会导致错误 (GH13019 )
与>=进行比较 numpy 1.11用于 NaT 比较 (GH12969 )
窃听 .drop() 具有非唯一的 MultiIndex 。 (GH12701 )
窃听 .concat DateTime Tz感知和朴素的DataFrames (GH12467 )
在正确地引发 ValueError 在……里面 .resample(..).fillna(..) 在传递非字符串时 (GH12952 )
Bug fixes in various encoding and header processing issues in pd.read_sas() (GH12659, GH12654, GH12647, GH12809)
Bug in pd.crosstab() where would silently ignore aggfunc if values=None (GH12569).
Potential segfault in DataFrame.to_json when serialising datetime.time (GH11473).
中的潜在分段故障 DataFrame.to_json 尝试序列化0d数组时 (GH11299 )。
段故障输入 to_json 在尝试序列化 DataFrame 或 Series 使用非ndarray值；现在支持序列化 category ， sparse ，以及 datetime64[ns, tz] 数据类型 (GH10778 )。
窃听 DataFrame.to_json 不支持的数据类型未传递给默认处理程序 (GH12554 )。
窃听 .align 不返回子类 (GH12983 )
Bug in aligning a Series with a DataFrame (GH13037)
窃听 ABCPanel 其中 Panel4D 未被视为此泛型类型的有效实例 (GH12810 )
一致性中的错误 .name 在……上面 .groupby(..).apply(..) 案例 (GH12363 )
窃听 Timestamp.__repr__ 这导致了 pprint 在嵌套结构中失败 (GH12622 )
窃听 Timedelta.min 和 Timedelta.max 属性现在报告真实的最小值/最大值 timedeltas 这是Pandas们所认可的。请参阅 documentation 。 (GH12727 )
窃听 .quantile() 使用插值法可能会强制 float 出乎意料的是 (GH12772 )
Bug in .quantile() with empty Series may return scalar rather than empty Series (GH12772)
Bug in .loc with out-of-bounds in a large indexer would raise IndexError rather than KeyError (GH12527)
使用时重采样中的错误 TimedeltaIndex 和 .asfreq() ，以前不会包括最后的栅栏柱 (GH12926 )
Bug in equality testing with a Categorical in a DataFrame (GH12564)
窃听 GroupBy.first() ， .last() 在以下情况下返回不正确的行 TimeGrouper vt.使用 (GH7453 )
窃听 pd.read_csv() 使用 c 引擎时指定 skiprows 在引用的项目中使用换行符 (GH10911 ， GH12775 )
窃听 DataFrame 分配支持TZ的日期时间时丢失时区 Series 使用对齐 (GH12981 )
窃听 .value_counts() 什么时候 normalize=True 和 dropna=True 其中空值仍然对归一化计数起作用 (GH12558 )
Bug in Series.value_counts() loses name if its dtype is category (GH12835)
窃听 Series.value_counts() 丢失时区信息 (GH12835 )
Bug in Series.value_counts(normalize=True) with Categorical raises UnboundLocalError (GH12835)
Bug in Panel.fillna() ignoring inplace=True (GH12633)
窃听 pd.read_csv() 当指定 names ， usecols ，以及 parse_dates 同时与 c 发动机 (GH9755 )
窃听 pd.read_csv() 当指定 delim_whitespace=True 和 lineterminator 同时与 c 发动机 (GH12912 )
窃听 Series.rename ， DataFrame.rename 和 DataFrame.rename_axis 不治疗 Series 作为重新标记的映射 (GH12623 )。
清理完毕 .rolling.min 和 .rolling.max 增强数据类型处理 (GH12373 )
窃听 groupby 其中复杂类型被强制为浮点 (GH12902 )
Bug in Series.map raises TypeError if its dtype is category or tz-aware datetime (GH12473)
用于一些测试比较的32位平台上的错误 (GH12972 )
从以下位置回落时的索引强制错误 RangeIndex 施工 (GH12893 )
当传递无效参数(例如浮动窗口)时，窗口函数中的错误消息会更好 (GH12669 )
Bug in slicing subclassed DataFrame defined to return subclassed Series may return normal Series (GH11559)
Bug in .str accessor methods may raise ValueError if input has name and the result is DataFrame or MultiIndex (GH12617)
窃听 DataFrame.last_valid_index() 和 DataFrame.first_valid_index() 在空框架上 (GH12800 )
Bug in CategoricalIndex.get_loc returns different result from regular Index (GH12531)
窃听 PeriodIndex.resample 未传播名称的位置 (GH12769 )
窃听 date_range closed 关键字和时区 (GH12684 )。
窃听 pd.concat 加薪 AttributeError 当输入数据包含TZ感知的日期时间和时间增量时 (GH12620 )
窃听 pd.concat 未处理空 Series 恰如其分 (GH11082 )
Bug in .plot.bar alignment when width is specified with int (GH12979)
窃听 fill_value 如果二元运算符的参数是常量，则忽略 (GH12723 )
窃听 pd.read_html() 当使用BS4风格和解析表时，只有一个标题和一列 (GH9178 )
窃听 .pivot_table 什么时候 margins=True 和 dropna=True 空值仍然影响保证金计算 (GH12577 )
窃听 .pivot_table 什么时候 dropna=False 表索引名/列名消失的位置 (GH12133 )
窃听 pd.crosstab() 什么时候 margins=True 和 dropna=False 这引发了 (GH12642 )
窃听 Series.name 什么时候 name 属性可以是可哈希类型 (GH12610 )
窃听 .describe() 重置分类列信息 (GH11558 )
BUG在哪里 loffset 调用时未应用参数 resample().count() 在时间系列片上 (GH12725 )
pd.read_excel() now accepts column names associated with keyword argument names (GH12870)
Bug in pd.to_numeric() with Index returns np.ndarray, rather than Index (GH12777)
Bug in pd.to_numeric() with datetime-like may raise TypeError (GH12777)
Bug in pd.to_numeric() with scalar raises ValueError (GH12777)

贡献者#

共有60人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Andrew Fiore-Gartland +
Bastiaan +
Benoît Vinot +
Brandon Rhodes +
DaCoEx +
Drew Fustin +
Ernesto Freitas +
Filip Ter +
Gregory Livschitz +
Gábor Lipták
Hassan Kibirige +
Iblis Lin
Israel Saeta Pérez +
Jason Wolosonovich +
Jeff Reback
Joe Jevnik
Joris Van den Bossche
Joshua Storck +
Ka Wo Chen
Kerby Shedden
Kieran O'Mahony
Leif Walsh +
Mahmoud Lababidi +
Maoyuan Liu +
Mark Roth +
Matt Wittmann
MaxU +
Maximilian Roos
Michael Droettboom +
Nick Eubank
Nicolas Bonnotte
OXPHOS +
Pauli Virtanen +
Peter Waller +
Pietro Battiston
Prabhjot Singh +
Robin Wilson
Roger Thomas +
Sebastian Bank
Stephen Hoover
Tim Hopper +
Tom Augspurger
WANG Aiyong
Wes Turner
Winand +
Xbar +
Yan Facai +
adneu +
ajenkins-cargometrics +
behzad nouri
chinskiy +
gfyoung
jeps-journal +
jonaslb +
kotrfa +
nileracecrew +
onesandzeroes
rs2 +
sinhrks
tsdlovell +

0.19.0版(2016年10月2日)

0.18.0版(2016年3月13日)

0.18.1版(2016年5月3日)#

新功能#

自定义营业时间#

方法 .groupby(..) 带有窗口和重采样操作的语法#

方法链接的改进#

方法： .where() 和 .mask()#

方法： .loc[] ， .iloc[] ， .ix[]#

索引使用 []#

启用部分字符串索引 DatetimeIndex 当一个 MultiIndex#

汇编日期时间#

其他增强功能#

稀疏变化#

API更改#

方法 .groupby(..).nth() 变化#

NumPy函数兼容性#

使用 .apply 关于按重采样分组#

中的更改 read_csv 例外#

方法 to_datetime 错误更改#

其他API更改#

不推荐使用#

性能改进#

错误修复#

贡献者#

方法 `.groupby(..)` 带有窗口和重采样操作的语法#

方法： `.where()` 和 `.mask()`#

方法： `.loc[]` ， `.iloc[]` ， `.ix[]`#

索引使用 `[]`#

启用部分字符串索引 `DatetimeIndex` 当一个 `MultiIndex`#

方法 `.groupby(..).nth()` 变化#

使用 `.apply` 关于按重采样分组#

中的更改 `read_csv` 例外#

方法 `to_datetime` 错误更改#