版本0.15.1(2014年11月9日)#

这是从0.15.0开始的一个较小的错误修复版本，包括少量的API更改、几个新功能、增强功能和性能改进，以及大量的错误修复。我们建议所有用户升级到此版本。

Enhancements
API Changes
Bug Fixes

API更改#

s.dt.hour 以及其他 .dt 访问器现在将返回 np.nan 对于缺失值(而不是之前的-1)， (GH8689 )

In [1]: s = pd.Series(pd.date_range("20130101", periods=5, freq="D"))

In [2]: s.iloc[2] = np.nan

In [3]: s
Out[3]: 
0   2013-01-01
1   2013-01-02
2          NaT
3   2013-01-04
4   2013-01-05
Length: 5, dtype: datetime64[ns]

以前的行为：

In [6]: s.dt.hour
Out[6]:
0    0
1    0
2   -1
3    0
4    0
dtype: int64

当前行为：

In [4]: s.dt.hour
Out[4]: 
0    0.0
1    0.0
2    NaN
3    0.0
4    0.0
Length: 5, dtype: float64

groupby 使用 as_index=False 不会向结果中添加错误的额外列 (GH8582 )：

In [5]: np.random.seed(2718281)

In [6]: df = pd.DataFrame(np.random.randint(0, 100, (10, 2)), columns=["jim", "joe"])

In [7]: df.head()
Out[7]: 
   jim  joe
0   61   81
1   96   49
2   55   65
3   72   51
4   77   12

[5 rows x 2 columns]

In [8]: ts = pd.Series(5 * np.random.randint(0, 3, 10))

以前的行为：

In [4]: df.groupby(ts, as_index=False).max()
Out[4]:
   NaN  jim  joe
0    0   72   83
1    5   77   84
2   10   96   65

当前行为：

In [9]: df.groupby(ts, as_index=False).max()
Out[9]: 
   jim  joe
0   72   83
1   77   84
2   96   65

[3 rows x 2 columns]

groupby 如果列名称与分组程序名称冲突，则不会错误地排除列 (GH8112 )：

In [10]: df = pd.DataFrame({"jim": range(5), "joe": range(5, 10)})

In [11]: df
Out[11]: 
   jim  joe
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9

[5 rows x 2 columns]

In [12]: gr = df.groupby(df["jim"] < 2)

以前的行为(从输出中排除第一列)：

In [4]: gr.apply(sum)
Out[4]:
       joe
jim
False   24
True    11

当前行为：

In [13]: gr.apply(sum)
Out[13]: 
       jim  joe
jim            
False    9   24
True     1   11

[2 rows x 2 columns]

支持使用单调递减索引进行切片，即使 start 或 stop 在索引中未找到 (GH7860 )：

In [14]: s = pd.Series(["a", "b", "c", "d"], [4, 3, 2, 1])

In [15]: s
Out[15]: 
4    a
3    b
2    c
1    d
Length: 4, dtype: object

以前的行为：

In [8]: s.loc[3.5:1.5]
KeyError: 3.5

当前行为：

In [16]: s.loc[3.5:1.5]
Out[16]: 
3    b
2    c
Length: 2, dtype: object

io.data.Options 已修复更改雅虎选项页面格式的问题 (GH8612 ), (GH8741 )

备注

由于雅虎的选项页面布局发生了变化，当给出到期日期时， Options 方法现在返回单个到期日期的数据。以前，方法返回选定月份的所有数据。

这个 month 和 year 参数尚未弃用，可用于获取给定月份的所有选项数据。

如果给定的到期日期无效，则返回给定日期之后的下一个到期日期的数据。

选项数据框现在在实例上另存为 callsYYMMDD 或 putsYYMMDD 。以前，它们被另存为 callsMMYY 和 putsMMYY 。下一个到期时间另存为 calls 和 puts 。

新功能：

现在，过期参数可以是单个日期，也可以是包含日期的类似列表的对象。
一处新物业 expiry_dates 已添加，它返回所有可用的到期日期。

当前行为：

In [17]: from pandas.io.data import Options

In [18]: aapl = Options('aapl', 'yahoo')

In [19]: aapl.get_call_data().iloc[0:5, 0:1]
Out[19]:
                                             Last
Strike Expiry     Type Symbol
80     2014-11-14 call AAPL141114C00080000  29.05
84     2014-11-14 call AAPL141114C00084000  24.80
85     2014-11-14 call AAPL141114C00085000  24.05
86     2014-11-14 call AAPL141114C00086000  22.76
87     2014-11-14 call AAPL141114C00087000  21.74

In [20]: aapl.expiry_dates
Out[20]:
[datetime.date(2014, 11, 14),
 datetime.date(2014, 11, 22),
 datetime.date(2014, 11, 28),
 datetime.date(2014, 12, 5),
 datetime.date(2014, 12, 12),
 datetime.date(2014, 12, 20),
 datetime.date(2015, 1, 17),
 datetime.date(2015, 2, 20),
 datetime.date(2015, 4, 17),
 datetime.date(2015, 7, 17),
 datetime.date(2016, 1, 15),
 datetime.date(2017, 1, 20)]

In [21]: aapl.get_near_stock_price(expiry=aapl.expiry_dates[0:3]).iloc[0:5, 0:1]
Out[21]:
                                            Last
Strike Expiry     Type Symbol
109    2014-11-22 call AAPL141122C00109000  1.48
       2014-11-28 call AAPL141128C00109000  1.79
110    2014-11-14 call AAPL141114C00110000  0.55
       2014-11-22 call AAPL141122C00110000  1.02
       2014-11-28 call AAPL141128C00110000  1.32

pandas now also registers the datetime64 dtype in matplotlib's units registry to plot such values as datetimes. This is activated once pandas is imported. In previous versions, plotting an array of datetime64 values will have resulted in plotted integer values. To keep the previous behaviour, you can do del matplotlib.units.registry[np.datetime64] (GH8614).

增强#

concat 允许将更多种类的PANAS对象迭代作为第一个参数传递 (GH8645 )：

In [17]: from collections import deque

In [18]: df1 = pd.DataFrame([1, 2, 3])

In [19]: df2 = pd.DataFrame([4, 5, 6])

以前的行为：

In [7]: pd.concat(deque((df1, df2)))
TypeError: first argument must be a list-like of pandas objects, you passed an object of type "deque"

当前行为：

In [20]: pd.concat(deque((df1, df2)))
Out[20]: 
   0
0  1
1  2
2  3
0  4
1  5
2  6

[6 rows x 1 columns]

代表 MultiIndex 根据级别大小使用内存的数据类型的标签。在以前的版本中，内存使用量是每个级别中每个元素恒定的8个字节。此外，在以前的版本中， 已报告 内存使用不正确，因为它没有显示底层数据数组占用的内存的使用情况。 (GH8456 )

In [21]: dfi = pd.DataFrame(
   ....:     1, index=pd.MultiIndex.from_product([["a"], range(1000)]), columns=["A"]
   ....: )
   ....: 

以前的行为：

# this was underreported in prior versions
In [1]: dfi.memory_usage(index=True)
Out[1]:
Index    8000 # took about 24008 bytes in < 0.15.1
A        8000
dtype: int64

当前行为：

In [22]: dfi.memory_usage(index=True)
Out[22]: 
Index    44212
A         8000
Length: 2, dtype: int64

Added Index properties is_monotonic_increasing and is_monotonic_decreasing (GH8680).
添加了在导入Stata文件时选择列的选项 (GH7935 )
限定中的内存使用情况 DataFrame.info() 通过添加 + 如果它是一个下限 (GH8578 )
在某些聚合情况下引发错误，在这些情况下， numeric_only 未处理 (GH8592 )。
Added support for 3-character ISO and non-standard country codes in io.wb.download() (GH8482)
世界银行数据请求现在将根据 errors 参数，以及硬编码国家代码列表和世界银行的JSON响应。在以前的版本中，错误消息不考虑世界银行的JSON响应。在请求之前，简单地丢弃了引发问题的输入。问题是，许多好的国家都是以硬编码的方式裁剪的。所有国家现在都将发挥作用，但一些糟糕的国家会提出例外，因为一些边缘案例打破了整个反应。 (GH8482 )
Added option to Series.str.split() to return a DataFrame rather than a Series (GH8428)
将选项添加到 df.info(null_counts=None|True|False) 覆盖默认显示选项并强制显示空计数 (GH8701 )

错误修复#

取消酸洗的过程中出现错误 CustomBusinessDay 对象 (GH8591 )
Bug in coercing Categorical to a records array, e.g. df.to_records() (GH8626)
Bug in Categorical not created properly with Series.to_frame() (GH8626)
强制中的错误，类型不同 Categorical 一名合格的 pd.Categorical (这现在引发了 TypeError 正确地)， (GH8626 )
Bug in cut/qcut when using Series and retbins=True (GH8589)
Bug in writing Categorical columns to an SQL database with to_sql (GH8624).
比较中出现错误 Categorical 在与标量日期时间进行比较时， (GH8687 )
Bug in selecting from a Categorical with .iloc (GH8623)
GROUPBY转换中的错误 (GH8623 )
复制/DROP_DUPLICATES中的错误 (GH8623 )
窃听 Categorical 反映了如果第一个参数是数值数组标量(例如np.int64)，则引发比较运算符 (GH8658 )
使用类似列表的面板索引时出现错误 (GH8710 )
比较问题是 DataFrame.dtypes 什么时候 options.mode.use_inf_as_null 是真的吗 (GH8722 )
窃听 read_csv ， dialect 参数不接受字符串 (GH8703 )
使用空列表对多索引级别进行切片时出错 (GH8737 )
使用NumPy数组的浮点/索引的加/减的数值索引操作中存在错误 (GH8608 )
带有空索引器和不需要的数据类型强制的集合中的错误 (GH8669 )
Setitem上的ix/loc块拆分中的错误(使用类整数数据类型的清单，例如DateTime64) (GH8607 )
使用非唯一但单调索引的索引中找不到的整数执行基于标签的索引时出现错误 (GH8680 )。
使用索引Float64Index时出现错误 np.nan 关于NumPy 1.7 (GH8980 )。
Fix shape attribute for MultiIndex (GH8609)
窃听 GroupBy 其中，分组程序和列之间的名称冲突将中断 groupby 运营 (GH7115 ， GH8112 )
修复了在打印列时出现的错误 y 并且指定标签会改变原始DataFrame的索引名 (GH8494 )
修复直接使用matplotlib绘制DatetimeIndex时的回归问题 (GH8614 )。
窃听 date_range 其中，部分指定的日期将包含当前日期 (GH6961 )
索引器设置为具有混合数据类型的标量值时出错 Panel4d 失败了 (GH8702 )
BUG在哪里 DataReader 如果传递的一个符号无效，%s将失败。现在返回有效符号的数据，返回无效符号的np.nan (GH8494 )
窃听 get_quote_yahoo 这将不允许非浮点型返回值 (GH5229 )。

贡献者#

共有23人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Aaron Staple +
Andrew Rosenfeld
Anton I. Sipos
Artemy Kolchinsky
Bill Letson +
Dave Hughes +
David Stephens
Guillaume Horel +
Jeff Reback
Joris Van den Bossche
Kevin Sheppard
Nick Stahl +
Sanghee Kim +
Stephan Hoyer
Tom Augspurger
TomAugspurger
WANG Aiyong +
behzad nouri
immerrr
jnmclarty
jreback
pallav-fdsi +
unutbu

版本0.15.2(2014年12月12日)

版本0.15.0(2014年10月18日)