版本0.17.1(2015年11月21日)#

备注

我们很自豪地宣布 Pandas 已成为世界银行赞助的 (NumFOCUS organization )。这将有助于确保成功开发 Pandas 作为一个世界级的开源项目。

这是从0.17.0开始的一个较小的错误修复版本，包括大量错误修复以及几个新功能、增强功能和性能改进。我们建议所有用户升级到此版本。

亮点包括：

支持条件HTML格式，请参阅 here
在CSV读卡器和其他操作上释放GIL，请参见 here
修复了中的回归问题 DataFrame.drop_duplicates 从0.16.2开始，导致整数值结果不正确 (GH11376 )

V0.17.1中的新特性

新功能
- 条件HTML格式设置
增强
API更改
- 不推荐使用
性能改进
错误修复
贡献者

新功能#

条件HTML格式设置#

警告

这是一项新功能，正在积极开发中。我们将在未来的版本中添加一些功能，可能会做出突破性的改变。欢迎在以下方面提供反馈 GH11610

我们已经添加了 实验性的 对条件HTML格式的支持：基于数据的DataFrame的视觉样式。样式设置是通过HTML和CSS来完成的。属性访问Styler类 pandas.DataFrame.style 、属性、实例 Styler 附上了你的数据。

下面是一个简单的例子：

In [1]: np.random.seed(123)

In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))

In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)

我们可以呈现该HTML以获得下表。

	a	b	c	d	e
0	-1.085631	0.997345	0.282978	-1.506295	-0.5786
1	1.651437	-2.426679	-0.428913	1.265936	-0.86674
2	-0.678886	-0.094709	1.49139	-0.638902	-0.443982
3	-0.434351	2.20593	2.186786	1.004054	0.386186
4	0.737369	1.490732	-0.935834	1.175829	-1.253881
5	-0.637752	0.907105	-1.428681	-0.140069	-0.861755
6	-0.255619	-2.798589	-1.771533	-0.699877	0.927462
7	-0.173636	0.002846	0.688223	-0.879536	0.283627
8	-0.805367	-1.727669	-0.3909	0.573806	0.338589
9	-0.01183	2.392365	0.412912	0.978736	2.238143

Styler 与Jupyter Notebook互动良好。请参阅 documentation 想要更多。

增强#

DatetimeIndex now supports conversion to strings with astype(str) (GH10442)
Support for compression (gzip/bz2) in pandas.DataFrame.to_csv() (GH7615)
pd.read_* 函数现在还可以接受 pathlib.Path ，或 py:py._path.local.LocalPath 对象的 filepath_or_buffer 论点。 (GH11033 )-- DataFrame 和 Series 功能 .to_csv() ， .to_html() 和 .to_latex() 现在可以处理以代字号开头的路径(例如 ~/Documents/ ) (GH11438 )
DataFrame 现在使用 namedtuple 作为列，如果未提供列 (GH11181 )
DataFrame.itertuples() 现在返回 namedtuple 对象，如果可能的话。 (GH11269 ， GH11625 )
已添加 axvlines_kwds 平行坐标打印的步骤 (GH10709 )

选项以 .info() 和 .memory_usage() 以提供对内存消耗的深入自省。请注意，这可能需要很高的计算成本，因此是一个可选参数。 (GH11595 )

In [4]: df = pd.DataFrame({"A": ["foo"] * 1000})  # noqa: F821

In [5]: df["B"] = df["A"].astype("category")

# shows the '+' as we have object dtypes
In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   A       1000 non-null   object  
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 9.0+ KB

# we have an accurate memory assessment (but can be expensive to compute this)
In [7]: df.info(memory_usage="deep")
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   A       1000 non-null   object  
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 59.9 KB

Index 现在有一个 fillna 方法 (GH10089 )

In [8]: pd.Index([1, np.nan, 3]).fillna(2)
Out[8]: Float64Index([1.0, 2.0, 3.0], dtype='float64')

型式系列 category 现在让我们 .str.<...> 和 .dt.<...> 如果类别属于该类型，则访问器方法/属性可用。 (GH10661 )

In [9]: s = pd.Series(list("aabb")).astype("category")

In [10]: s
Out[10]: 
0    a
1    a
2    b
3    b
Length: 4, dtype: category
Categories (2, object): ['a', 'b']

In [11]: s.str.contains("a")
Out[11]: 
0     True
1     True
2    False
3    False
Length: 4, dtype: bool

In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category")

In [13]: date
Out[13]: 
0   2015-01-01
1   2015-01-02
2   2015-01-03
3   2015-01-04
4   2015-01-05
Length: 5, dtype: category
Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05]

In [14]: date.dt.day
Out[14]: 
0    1
1    2
2    3
3    4
4    5
Length: 5, dtype: int64

pivot_table 现在有一个 margins_name 参数，这样您就可以使用缺省值‘all’之外的其他参数 (GH3335 )
实施出口 datetime64[ns, tz] 具有固定HDF5存储的数据类型 (GH11411 )
漂亮的打印集(例如，在DataFrame单元格中)现在使用集文字语法 ({{x, y}} )，而不是旧式的Python语法 (set([x, y]) ) (GH11215 )
改进中的错误消息 pandas.io.gbq.to_gbq() 当流插入失败时 (GH11285 )以及当DataFrame与目标表的架构不匹配时 (GH11359 )

API更改#

加薪 NotImplementedError 在……里面 Index.shift 对于不受支持的索引类型 (GH8038 )
min and max reductions on datetime64 and timedelta64 dtyped series now result in NaT and not nan (GH11245).
Indexing with a null key will raise a TypeError, instead of a ValueError (GH11356)
Series.ptp 现在默认情况下将忽略缺少的值 (GH11163 )

不推荐使用#

这个 pandas.io.ga 实现以下功能的模块 google-analytics 支持已弃用，并将在未来版本中删除 (GH11308 )
不推荐使用 engine 输入关键字 .to_csv() ，它将在未来的版本中删除 (GH11274 )

性能改进#

在对索引排序之前检查单调性 (GH11080 )
Series.dropna performance improvement when its dtype can't contain NaN (GH11159)
Release the GIL on most datetime field operations (e.g. DatetimeIndex.year, Series.dt.year), normalization, and conversion to and from Period, DatetimeIndex.to_period and PeriodIndex.to_timestamp (GH11263)
Release the GIL on some rolling algos: rolling_median, rolling_mean, rolling_max, rolling_min, rolling_var, rolling_kurt, rolling_skew (GH11450)
Release the GIL when reading and parsing text files in read_csv, read_table (GH11272)
Improved performance of rolling_median (GH11450)
Improved performance of to_excel (GH11352)
Repr中的性能错误 Categorical 类别，它在分割字符串以供显示之前对其进行呈现 (GH11305 )
性能提升 Categorical.remove_unused_categories ， (GH11643 )。
Improved performance of Series constructor with no data and DatetimeIndex (GH11433)
改进的性能 shift ， cumprod ，以及 cumsum 使用GROUPBY (GH4095 )

错误修复#

SparseArray.__iter__() 现在不会引起 PendingDeprecationWarning 在Python3.5中 (GH11622 )
从0.16.2回归到长浮点数/NaN的输出格式，在 (GH11302 )
Series.sort_index() 现在可以正确地处理 inplace 选项 (GH11402 )
生成时未正确分发的.c文件 PyPi 当读取浮点数的CSV并通过时 na_values=<a scalar> 将显示一个例外 (GH11374 )
窃听 .to_latex() 当索引有名称时，输出中断 (GH10660 )
窃听 HDFStore.append 使用其编码长度超过最大未编码长度的字符串 (GH11234 )
合并中的错误 datetime64[ns, tz] 数据类型 (GH11405 )
窃听 HDFStore.select 在与WHERE子句中的数字标量进行比较时 (GH11283 )
在使用中出现错误 DataFrame.ix 使用多索引索引器 (GH11372 )
窃听 date_range 具有不明确的端点 (GH11626 )
防止向访问器添加新属性 .str ， .dt 和 .cat 。无法检索这样的值，因此设置它时出错。 (GH10673 )
TZ转换中的错误，时间不明确， .dt 访问者 (GH11295 )
使用不明确时间的索引时，输出格式中存在错误 (GH11619 )
连续剧与榜单点赞比较中的错误 (GH11339 )
窃听 DataFrame.replace 使用一个 datetime64[ns, tz] 和要替换的不兼容对象 (GH11326 ， GH11153 )
窃听 isnull 哪里 numpy.datetime64('NaT') 在一个 numpy.array 未确定为空 (GH11206 )
使用混合整数索引的类似列表的索引中存在错误 (GH11320 )
窃听 pivot_table 使用 margins=True 当索引为 Categorical 数据类型 (GH10993 )
窃听 DataFrame.plot 不能使用十六进制字符串颜色 (GH10299 )
回归到 DataFrame.drop_duplicates 从0.16.2开始，导致整数值结果不正确 (GH11376 )
窃听 pd.eval 列表中的一元运算符错误 (GH11235 )
窃听 squeeze() 使用零长度数组 (GH11230 ， GH8999 )
窃听 describe() 删除分层索引的列名 (GH11517 )
窃听 DataFrame.pct_change() 不传播 axis 打开关键字 .fillna 方法 (GH11150 )
窃听 .to_csv() 当混合使用整数和字符串列名称作为 columns 参数 (GH11637 )
在索引中使用 range ， (GH11652 )
设置列时数字标量和保留数据类型的推断错误 (GH11638 )
窃听 to_sql 使用Unicode列名为UnicodeEncodeError提供 (GH11431 )。
Fix regression in setting of xticks in plot (GH11529).
窃听 holiday.dates 不能将遵守规则应用于假期和文档增强 (GH11477 ， GH11533 )
Fix plotting issues when having plain Axes instances instead of SubplotAxes (GH11520, GH11556).
Bug in DataFrame.to_latex() produces an extra rule when header=False (GH7124)
窃听 df.groupby(...).apply(func) 当函数返回一个 Series 包含新的类似DateTime的列 (GH11324 )
窃听 pandas.json 当要加载的文件较大时 (GH11344 )
虫子进来了 to_excel 具有重复的列 (GH11007 ， GH10982 ， GH10970 )
Fixed a bug that prevented the construction of an empty series of dtype datetime64[ns, tz] (GH11245).
窃听 read_excel 包含整数的多重索引 (GH11317 )
窃听 to_excel 与Openpyxl 2.2+和合并 (GH11408 )
窃听 DataFrame.to_dict() 会产生一个 np.datetime64 对象，而不是 Timestamp 当数据中只存在日期时间时 (GH11327 )
窃听 DataFrame.corr() 在计算具有布尔列而不是布尔列的DataFrames的Kendall相关性时引发异常 (GH11560 )
C++导致的链接时错误 inline 在FreeBSD 10+上的函数(带 clang ) (GH10510 )
Bug in DataFrame.to_csv in passing through arguments for formatting MultiIndexes, including date_format (GH7791)
Bug in DataFrame.join() with how='right' producing a TypeError (GH11519)
窃听 Series.quantile WITH空列表结果有 Index 使用 object 数据类型 (GH11588 )
窃听 pd.merge 结果为空 Int64Index 而不是 Index(dtype=object) 当合并结果为空时 (GH11588 )
窃听 Categorical.remove_unused_categories 当有 NaN 值 (GH11599 )
窃听 DataFrame.to_sparse() 丢失多索引的列名 (GH11600 )
窃听 DataFrame.round() 使用非唯一列索引会产生致命的Python错误 (GH11611 )
窃听 DataFrame.round() 使用 decimals 是生成额外列的非唯一索引系列 (GH11618 )

贡献者#

共有63人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Aleksandr Drozd +
Alex Chase +
Anthonios Partheniou
BrenBarn +
Brian J. McGuirk +
Chris
Christian Berendt +
Christian Perez +
Cody Piersall +
Data & Code Expert Experimenting with Code on Data
DrIrv +
Evan Wright
Guillaume Gay
Hamed Saljooghinejad +
Iblis Lin +
Jake VanderPlas
Jan Schulz
Jean-Mathieu Deschenes +
Jeff Reback
Jimmy Callin +
Joris Van den Bossche
K.-Michael Aye
Ka Wo Chen
Loïc Séguin-C +
Luo Yicheng +
Magnus Jöud +
Manuel Leonhardt +
Matthew Gilbert
Maximilian Roos
Michael +
Nicholas Stahl +
Nicolas Bonnotte +
Pastafarianist +
Petra Chong +
Phil Schaf +
Philipp A +
Rob deCarvalho +
Roman Khomenko +
Rémy Léone +
Sebastian Bank +
Sinhrks
Stephan Hoyer
Thierry Moisan
Tom Augspurger
Tux1 +
Varun +
Wieland Hoffmann +
Winterflower
Yoav Ram +
Younggun Kim
Zeke +
ajcr
azuranski +
behzad nouri
cel4
emilydolson +
hironow +
lexual
llllllllll +
rockg
silentquasar +
sinhrks
taeold +

0.18.0版(2016年3月13日)

版本0.17.0(2015年10月9日)