版本0.16.2(2015年6月12日)#

这是从0.16.1开始的一个小错误修复版本，包括大量错误修复以及一些新功能 (pipe() 方法)、增强和性能改进。

我们建议所有用户升级到此版本。

亮点包括：

一个新的 pipe 方法，请参见 here
有关如何使用的文档 numba 使用 Pandas ，请参见 here

V0.16.2中的新特性

新功能
- 管道
- 其他增强功能
API更改
性能改进
错误修复
贡献者

新功能#

管道#

我们引进了一种新方法 DataFrame.pipe() 。顾名思义， pipe 应用于通过函数调用链传递数据。其目标是避免混淆嵌套函数调用，如

# df is a DataFrame
# f, g, and h are functions that take and return DataFrames
f(g(h(df), arg1=1), arg2=2, arg3=3)  # noqa F821

逻辑从内到外流动，函数名称与它们的关键字参数分开。这可以重写为

(
    df.pipe(h)  # noqa F821
    .pipe(g, arg1=1)  # noqa F821
    .pipe(f, arg2=2, arg3=3)  # noqa F821
)

现在，代码和逻辑都是从上到下流动的。关键字参数位于它们的函数旁边。总体而言，代码的可读性要好得多。

在上面的示例中，函数 f ， g ，以及 h 每个函数都将DataFrame作为第一个位置参数。当要应用的函数将其数据带到除第一个参数之外的任何位置时，请传递 (function, keyword) 指示DataFrame应该流动的位置。例如：

In [1]: import statsmodels.formula.api as sm
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 import statsmodels.formula.api as sm

ModuleNotFoundError: No module named 'statsmodels'

In [2]: bb = pd.read_csv("data/baseball.csv", index_col="id")

# sm.ols takes (formula, data)
In [3]: (
   ...:     bb.query("h > 0")
   ...:     .assign(ln_h=lambda df: np.log(df.h))
   ...:     .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
   ...:     .fit()
   ...:     .summary()
   ...: )
   ...: 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [3], in <cell line: 2>()
      1 (
      2     bb.query("h > 0")
      3     .assign(ln_h=lambda df: np.log(df.h))
----> 4     .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
      5     .fit()
      6     .summary()
      7 )

NameError: name 'sm' is not defined

管道方法的灵感来自于Unix管道，它通过进程串流文本。最近 dplyr 和 magrittr 已经推出了受欢迎的 (%>%) 管道运算符 R.

请参阅 documentation 想要更多。 (GH10129 )

其他增强功能#

已添加 rsplit 索引/序列StringMethods (GH10303 )
上的硬编码大小限制。 DataFrame IPythonNotebook中的Html表示，并将其留给IPython本身(仅适用于IPythonv3.0或更高版本)。这消除了笔记本中出现的具有大框架的重复滚动条 (GH10231 )。

请注意，笔记本电脑上有一个 toggle output scrolling 该功能可限制非常大的帧的显示(通过单击输出的左侧)。您还可以使用Pandas选项配置DataFrame的显示方式，请参见此处 here 。
axis 的参数 DataFrame.quantile 现在也接受了 index 和 column 。 (GH9543 )

API更改#

Holiday 现在提高 NotImplementedError 如果两者都有 offset 和 observance ，而不是返回错误的结果。 (GH10217 )。

性能改进#

Improved Series.resample performance with dtype=datetime64[ns] (GH7754)
Increase performance of str.split when expand=True (GH10081)

错误修复#

窃听 Series.hist 当一行出现时引发错误 Series 被给予了 (GH10214 )
BUG在哪里 HDFStore.select 修改Passed Columns列表 (GH7212 )
窃听 Categorical 用来恢复 display.width 的 None 在Python3中 (GH10087 )
窃听 to_json 具有特定的方向和 CategoricalIndex 是否会出现分段故障 (GH10317 )
一些NaN函数没有一致的返回数据类型的错误 (GH10251 )
窃听 DataFrame.quantile 检查是否通过了有效的轴 (GH9543 )
窃听 groupby.apply 的聚合 Categorical 不保留类别 (GH10138 )
窃听 to_csv 哪里 date_format 被忽略，如果 datetime 是小数 (GH10209 )
窃听 DataFrame.to_json 具有混合数据类型 (GH10289 )
合并时缓存更新出现错误 (GH10264 )
窃听 mean() 其中整型数据类型可能溢出 (GH10172 )
BUG在哪里 Panel.from_dict 指定时不设置dtype (GH10058 )
窃听 Index.union 加薪 AttributeError 当传递类似数组的对象时。 (GH10149 )
窃听 Timestamp “%s” microsecond ， quarter ， dayofyear ， week 和 daysinmonth 属性返回 np.int 类型，不是内置 int 。 (GH10050 )
窃听 NaT 加薪 AttributeError 当访问时 daysinmonth ， dayofweek 属性。 (GH10096 )
在使用时索引错误 max_seq_items=None 设置 (GH10182 )。
使用获取时区数据时出错 dateutil 在不同平台上( GH9059 ， GH8639 ， GH9663 ， GH10121 )
显示混合频率的日期时间时出错；以正确的精度显示‘ms’日期时间。 (GH10170 )
窃听 setitem 其中将类型提升应用于整个块 (GH10280 )
窃听 Series 算术方法可能会错误地保留名称 (GH10068 )
窃听 GroupBy.get_group 当按多个键分组时，其中一个键是绝对的。 (GH10132 )
窃听 DatetimeIndex 和 TimedeltaIndex 名称在时间增量运算后丢失( GH9926 )
Bug in DataFrame construction from nested dict with datetime64 (GH10160)
窃听 Series 从以下位置构造 dict 使用 datetime64 钥匙 (GH9456 )
窃听 Series.plot(label="LABEL") 标签设置不正确 (GH10119 )
窃听 plot 非默认为matplotlib axes.grid 设置 (GH9792 )
错误，导致包含指数但没有小数的字符串被解析为 int 而不是 float 在……里面 engine='python' 对于 read_csv 解析器 (GH9565 )
窃听 Series.align 重置 name 什么时候 fill_value 是指定的 (GH10067 )
窃听 read_csv 导致未在空的DataFrame上设置索引名称 (GH10184 )
Bug in SparseSeries.abs resets name (GH10241)
窃听 TimedeltaIndex 切片可能会重置频率 (GH10292 )
Bug in GroupBy.get_group raises ValueError when group key contains NaT (GH6992)
窃听 SparseSeries 构造函数忽略输入数据名称 (GH10258 )
窃听 Categorical.remove_categories 导致 ValueError 在删除时 NaN 如果基础数据类型为浮点类型，则为类别 (GH10156 )
错误，INFER_FREQ推断TO_OFFSET不支持的时间规则(WOM-5xxx (GH9425 )
窃听 DataFrame.to_hdf() 对于无效的(非字符串)列名，TABLE FORMAT将引发看似无关的错误。现在，这是明确禁止的。 (GH9057 )
Bug to handle masking empty DataFrame (GH10126).
MySQL接口无法处理数字表名/列名的错误 (GH10255 )
Bug in read_csv with a date_parser that returned a datetime64 array of other time resolution than [ns] (GH10245)
窃听 Panel.apply 当结果的ndim=0时 (GH10332 )
窃听 read_hdf 哪里 auto_close 不能通过 (GH9327 )。
窃听 read_hdf 不能使用开着的商店 (GH10330 )。
Bug in adding empty DataFrames, now results in a DataFrame that .equals an empty DataFrame (GH10181).
窃听 to_hdf 和 HDFStore 它没有检查合规选择是否有效 (GH4582 ， GH8874 )。

贡献者#

共有34人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Andrew Rosenfeld
Artemy Kolchinsky
Bernard Willers +
Christer van der Meeren
Christian Hudon +
Constantine Glen Evans +
Daniel Julius Lasiman +
Evan Wright
Francesco Brundu +
Gaëtan de Menten +
Jake VanderPlas
James Hiebert +
Jeff Reback
Joris Van den Bossche
Justin Lecher +
Ka Wo Chen +
Kevin Sheppard
Mortada Mehyar
Morton Fox +
Robin Wilson +
Sinhrks
Stephan Hoyer
Thomas Grainger
Tom Ajamian
Tom Augspurger
Yoshiki Vázquez Baeza
Younggun Kim
austinc +
behzad nouri
jreback
lexual
rekcahpassyla +
scls19fr
sinhrks

版本0.17.0(2015年10月9日)

版本0.16.1(2015年5月11日)