版本0.7.3(2012年4月12日)#

这是从0.7.2发布的一个小版本，修复了许多小错误，并添加了许多不错的新功能。还有几个API更改需要注意；这些更改应该不会影响很多用户，我们倾向于将它们称为“错误修复”，尽管它们确实构成了行为上的更改。请参阅 full release notes 或在GitHub上的问题跟踪器上查看完整的列表。

新功能#

新的 fixed width file reader ， read_fwf
新的 scatter_matrix 用于制作散点图矩阵的函数

from pandas.tools.plotting import scatter_matrix

scatter_matrix(df, alpha=0.2)  # noqa F821

添加 stacked 系列和DataFrame的参数 plot 一种新的生产方法 stacked bar plots 。

df.plot(kind="bar", stacked=True)  # noqa F821

df.plot(kind="barh", stacked=True)  # noqa F821

添加对数x和y scaling options 至 DataFrame.plot 和 Series.plot
添加 kurt 计算峰度的级数和数据框架方法

NA布尔比较API更改#

恢复了对NA值(通常表示为 NaN 或 None )在非数字系列中处理：

In [1]: series = pd.Series(["Steve", np.nan, "Joe"])

In [2]: series == "Steve"
Out[2]:
0     True
1    False
2    False
Length: 3, dtype: bool

In [3]: series != "Steve"
Out[3]:
0    False
1     True
2     True
Length: 3, dtype: bool

相比之下，NA/NaN将始终作为 False 除了用 != 这就是 True 。 慎重其事 在NA数据存在的情况下，使用布尔运算，特别是求反运算。如果您担心这一点，您可能希望将显式NA过滤器添加到布尔数组操作中：

In [4]: mask = series == "Steve"

In [5]: series[mask & series.notnull()]
Out[5]:
0    Steve
Length: 1, dtype: object

虽然在比较中传播NA对一些用户来说似乎是正确的行为(您可以基于纯粹的技术理由认为这是正确的做法)，但评估表明，在任何地方传播NA，包括在数字数组中，都会给用户带来大量问题。因此，采取了“实用胜过纯洁”的方法。这个问题可能会在未来的某个时候被重新讨论。

其他API更改#

当呼叫时 apply 在分组的Series上，返回值也将是Series，以便与 groupby DataFrame的行为：

In [6]: df = pd.DataFrame(
   ...:     {
   ...:         "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
   ...:         "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
   ...:         "C": np.random.randn(8),
   ...:         "D": np.random.randn(8),
   ...:     }
   ...: )
   ...:

In [7]: df
Out[7]:
   A      B         C         D
0  foo    one  0.469112 -0.861849
1  bar    one -0.282863 -2.104569
2  foo    two -1.509059 -0.494929
3  bar  three -1.135632  1.071804
4  foo    two  1.212112  0.721555
5  bar    two -0.173215 -0.706771
6  foo    one  0.119209 -1.039575
7  foo  three -1.044236  0.271860

[8 rows x 4 columns]

In [8]: grouped = df.groupby("A")["C"]

In [9]: grouped.describe()
Out[9]:
   count      mean       std       min       25%       50%       75%       max
A
bar    3.0 -0.530570  0.526860 -1.135632 -0.709248 -0.282863 -0.228039 -0.173215
foo    5.0 -0.150572  1.113308 -1.509059 -1.044236  0.119209  0.469112  1.212112

[2 rows x 8 columns]

In [10]: grouped.apply(lambda x: x.sort_values()[-2:])  # top 2 values
Out[10]:
A
bar  1   -0.282863
     5   -0.173215
foo  0    0.469112
     4    1.212112
Name: C, Length: 4, dtype: float64

贡献者#

共有15人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Abraham Flaxman +
Adam Klein
Andreas H. +
Chang She
Dieter Vandenbussche
Jacques Kvam +
K.-Michael Aye +
Kamil Kisiel +
Martin Blais +
Skipper Seabold
Thomas Kluyver
Wes McKinney
Wouter Overmeire
Yaroslav Halchenko
lgautier +

版本0.8.0(2012年6月29日)

版本0.7.2(2012年3月16日)