0.18.0版(2016年3月13日)#

这是从0.17.1发布的一个主要版本，包括少量的API更改、几个新功能、增强功能和性能改进，以及大量的错误修复。我们建议所有用户升级到此版本。

警告

Pandas>=0.18.0不再支持与Python版本2.6和3.3的兼容性 (GH7718 ， GH11273 )

警告

numexpr 2.4.4版现在将显示警告，并且由于某些错误行为而不会被用作Pandas的计算后端。这不会影响其他版本(>=2.1和>=2.4.6)。 (GH12489 )

亮点包括：

移动和展开窗口函数现在是Series和DataFrame上的方法，类似于 .groupby ，请参见 here 。
添加对 RangeIndex 作为特定形式的 Int64Index 有关节省内存的信息，请参见 here 。
API对 .resample 方法，以使其更 .groupby 就像，看到了吗 here 。
删除了对使用浮点数的位置索引的支持，自0.14.0起已弃用。这现在将引发一个 TypeError ，请参见 here 。
这个 .to_xarray() function has been added for compatibility with the xarray package ，请参见 here 。
这个 read_sas 功能已增强为读取 sas7bdat 文件，请参见 here 。
增加了 .str.extractall() method ，并且API更改为 .str.extract() method 和 .str.cat() method 。
pd.test() 顶级机头测试跑道可供选择 (GH4327 )。

检查 API Changes 和 deprecations 在更新之前。

V0.18.0中的新特性

新功能#

窗口函数现在是方法#

Window functions have been refactored to be methods on Series/DataFrame objects, rather than top-level functions, which are now deprecated. This allows these window-type functions, to have a similar API to that of .groupby. See the full documentation here (GH11603, GH12373)

In [1]: np.random.seed(1234)

In [2]: df = pd.DataFrame({'A': range(10), 'B': np.random.randn(10)})

In [3]: df
Out[3]: 
   A         B
0  0  0.471435
1  1 -1.190976
2  2  1.432707
3  3 -0.312652
4  4 -0.720589
5  5  0.887163
6  6  0.859588
7  7 -0.636524
8  8  0.015696
9  9 -2.242685

[10 rows x 2 columns]

以前的行为：

In [8]: pd.rolling_mean(df, window=3)
        FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
                       DataFrame.rolling(window=3,center=False).mean()
Out[8]:
    A         B
0 NaN       NaN
1 NaN       NaN
2   1  0.237722
3   2 -0.023640
4   3  0.133155
5   4 -0.048693
6   5  0.342054
7   6  0.370076
8   7  0.079587
9   8 -0.954504

新行为：

In [4]: r = df.rolling(window=3)

它们显示了一种描述性描述

In [5]: r
Out[5]: Rolling [window=3,center=False,axis=0,method=single]

可用方法和属性的制表符完成。

In [9]: r.<TAB>  # noqa E225, E999
r.A           r.agg         r.apply       r.count       r.exclusions  r.max         r.median      r.name        r.skew        r.sum
r.B           r.aggregate   r.corr        r.cov         r.kurt        r.mean        r.min         r.quantile    r.std         r.var

这些方法在 Rolling 对象本身

In [6]: r.mean()
Out[6]: 
     A         B
NaN       NaN
NaN       NaN
1.0  0.237722
2.0 -0.023640
3.0  0.133155
4.0 -0.048693
5.0  0.342054
6.0  0.370076
7.0  0.079587
8.0 -0.954504

[10 rows x 2 columns]

它们提供getitem访问器

In [7]: r['A'].mean()
Out[7]: 
  NaN
  NaN
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
Name: A, Length: 10, dtype: float64

和多个聚合

In [8]: r.agg({'A': ['mean', 'std'],
   ...:        'B': ['mean', 'std']})
   ...: 
Out[8]: 
     A              B          
  mean  std      mean       std
0  NaN  NaN       NaN       NaN
1  NaN  NaN       NaN       NaN
2  1.0  1.0  0.237722  1.327364
3  2.0  1.0 -0.023640  1.335505
4  3.0  1.0  0.133155  1.143778
5  4.0  1.0 -0.048693  0.835747
6  5.0  1.0  0.342054  0.920379
7  6.0  1.0  0.370076  0.871850
8  7.0  1.0  0.079587  0.750099
9  8.0  1.0 -0.954504  1.162285

[10 rows x 4 columns]

更改为重命名#

Series.rename 和 NDFrame.rename_axis 现在可以采用标量或类似列表的参数来更改系列或轴 name ，除了他们更改标签的旧行为之外。 (GH9494 ， GH11965 )

In [9]: s = pd.Series(np.random.randn(5))

In [10]: s.rename('newname')
Out[10]: 
0    1.150036
1    0.991946
2    0.953324
3   -2.021255
4   -0.334077
Name: newname, Length: 5, dtype: float64

In [11]: df = pd.DataFrame(np.random.randn(5, 2))

In [12]: (df.rename_axis("indexname")
   ....:    .rename_axis("columns_name", axis="columns"))
   ....: 
Out[12]: 
columns_name         0         1
indexname                       
0             0.002118  0.405453
1             0.289092  1.321158
2            -1.546906 -0.202646
3            -0.655969  0.193421
4             0.553439  1.318152

[5 rows x 2 columns]

新功能在方法链中运行良好。以前，这些方法只接受映射到标签到了一个新的品牌。这将一如既往地对函数或类dict值起作用。

范围索引#

A RangeIndex 已添加到 Int64Index 子类，用于支持常见用例的内存节约替代方案。它的实现与python类似。 range 对象 (xrange 在python2中)，因为它只存储索引的开始、停止和步长值。它将透明地与用户API交互，转换为 Int64Index 如果需要的话。

现在，它将成为的默认构建索引 NDFrame 对象，而不是以前的 Int64Index 。 (GH939 ， GH12070 ， GH12071 ， GH12109 ， GH12888 )

以前的行为：

In [3]: s = pd.Series(range(1000))

In [4]: s.index
Out[4]:
Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=1000)

In [6]: s.index.nbytes
Out[6]: 8000

新行为：

In [13]: s = pd.Series(range(1000))

In [14]: s.index
Out[14]: RangeIndex(start=0, stop=1000, step=1)

In [15]: s.index.nbytes
Out[15]: 128

对str.Extact的更改#

这个 .str.extract 方法接受带有捕获组的正则表达式，在每个主题字符串中查找第一个匹配项，然后返回捕获组的内容 (GH11386 )。

在v0.18.0中， expand 参数已添加到 extract 。

expand=False ：它返回一个 Series ， Index ，或 DataFrame ，取决于主题和正则表达式模式(行为与0.18.0之前的版本相同)。
expand=True ：它始终返回一个 DataFrame 从用户的角度来看，这更一致，更不容易混淆。

当前缺省值为 expand=None 这就给了一个 FutureWarning 并使用 expand=False 。要避免此警告，请明确指定 expand 。

In [1]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=None)
FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame)
but in a future version of pandas this will be changed to expand=True (return DataFrame)

Out[1]:
0      1
1      2
2    NaN
dtype: object

提取包含一组的正则表达式将在以下情况下返回一个系列 expand=False 。

In [16]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=False)
Out[16]: 
0      1
1      2
2    NaN
Length: 3, dtype: object

它返回一个 DataFrame 用一列if expand=True 。

In [17]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=True)
Out[17]: 
     0
0    1
1    2
2  NaN

[3 rows x 1 columns]

访问一位 Index 如果正则表达式只有一个捕获组，则返回一个 Index 如果 expand=False 。

In [18]: s = pd.Series(["a1", "b2", "c3"], ["A11", "B22", "C33"])

In [19]: s.index
Out[19]: Index(['A11', 'B22', 'C33'], dtype='object')

In [20]: s.index.str.extract("(?P<letter>[a-zA-Z])", expand=False)
Out[20]: Index(['A', 'B', 'C'], dtype='object', name='letter')

它返回一个 DataFrame 用一列if expand=True 。

In [21]: s.index.str.extract("(?P<letter>[a-zA-Z])", expand=True)
Out[21]: 
  letter
0      A
1      B
2      C

[3 rows x 1 columns]

访问一位 Index 使用具有多个捕获组的正则表达式引发 ValueError 如果 expand=False 。

>>> s.index.str.extract("(?P<letter>[a-zA-Z])([0-9]+)", expand=False)
ValueError: only one regex group is supported with Index

它返回一个 DataFrame 如果 expand=True 。

In [22]: s.index.str.extract("(?P<letter>[a-zA-Z])([0-9]+)", expand=True)
Out[22]: 
  letter   1
0      A  11
1      B  22
2      C  33

[3 rows x 2 columns]

总而言之， extract(expand=True) 始终返回一个 DataFrame 每个主题字符串占一行，每个捕获组占一列。

添加str.Extall#

这个 .str.extractall 添加了方法 (GH11386 )。不像 extract ，它只返回第一个匹配项。

In [23]: s = pd.Series(["a1a2", "b1", "c1"], ["A", "B", "C"])

In [24]: s
Out[24]: 
A    a1a2
B      b1
C      c1
Length: 3, dtype: object

In [25]: s.str.extract(r"(?P<letter>[ab])(?P<digit>\d)", expand=False)
Out[25]: 
  letter digit
A      a     1
B      b     1
C    NaN   NaN

[3 rows x 2 columns]

这个 extractall 方法返回所有匹配项。

In [26]: s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)")
Out[26]: 
        letter digit
  match             
A 0          a     1
  1          a     2
B 0          b     1

[3 rows x 2 columns]

对str.cat的更改#

该方法 .str.cat() 将多个 Series 。之前，如果 NaN 值出现在系列中，调用 .str.cat() 在它上面会回来 NaN ，与其他地方不同 Series.str.* 原料药。此行为已被修改为忽略 NaN 默认情况下，值。 (GH11435 )。

一个新的、更友好的 ValueError 是为了防止错误地提供 sep 作为一个Arg，而不是一个Kwarg。 (GH11334 )。

In [27]: pd.Series(['a', 'b', np.nan, 'c']).str.cat(sep=' ')
Out[27]: 'a b c'

In [28]: pd.Series(['a', 'b', np.nan, 'c']).str.cat(sep=' ', na_rep='?')
Out[28]: 'a b ? c'

In [2]: pd.Series(['a', 'b', np.nan, 'c']).str.cat(' ')
ValueError: Did you mean to supply a ``sep`` keyword?

类似日期的舍入#

DatetimeIndex ， Timestamp ， TimedeltaIndex ， Timedelta 已经获得了 .round() ， .floor() 和 .ceil() 类日期圆化的方法、地板和天花板。 (GH4314 ， GH11963 )

天真的约会时间

In [29]: dr = pd.date_range('20130101 09:12:56.1234', periods=3)

In [30]: dr
Out[30]: 
DatetimeIndex(['2013-01-01 09:12:56.123400', '2013-01-02 09:12:56.123400',
               '2013-01-03 09:12:56.123400'],
              dtype='datetime64[ns]', freq='D')

In [31]: dr.round('s')
Out[31]: 
DatetimeIndex(['2013-01-01 09:12:56', '2013-01-02 09:12:56',
               '2013-01-03 09:12:56'],
              dtype='datetime64[ns]', freq=None)

# Timestamp scalar
In [32]: dr[0]
Out[32]: Timestamp('2013-01-01 09:12:56.123400', freq='D')

In [33]: dr[0].round('10s')
Out[33]: Timestamp('2013-01-01 09:13:00')

在当地时间，TZ意识到的是圆形、底板和天花板

In [34]: dr = dr.tz_localize('US/Eastern')

In [35]: dr
Out[35]: 
DatetimeIndex(['2013-01-01 09:12:56.123400-05:00',
               '2013-01-02 09:12:56.123400-05:00',
               '2013-01-03 09:12:56.123400-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq=None)

In [36]: dr.round('s')
Out[36]: 
DatetimeIndex(['2013-01-01 09:12:56-05:00', '2013-01-02 09:12:56-05:00',
               '2013-01-03 09:12:56-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq=None)

Timedeltas

In [37]: t = pd.timedelta_range('1 days 2 hr 13 min 45 us', periods=3, freq='d')

In [38]: t
Out[38]: 
TimedeltaIndex(['1 days 02:13:00.000045', '2 days 02:13:00.000045',
                '3 days 02:13:00.000045'],
               dtype='timedelta64[ns]', freq='D')

In [39]: t.round('10min')
Out[39]: TimedeltaIndex(['1 days 02:10:00', '2 days 02:10:00', '3 days 02:10:00'], dtype='timedelta64[ns]', freq=None)

# Timedelta scalar
In [40]: t[0]
Out[40]: Timedelta('1 days 02:13:00.000045')

In [41]: t[0].round('2h')
Out[41]: Timedelta('1 days 02:00:00')

此外, .round() ， .floor() 和 .ceil() 将通过 .dt 的访问者 Series 。

In [42]: s = pd.Series(dr)

In [43]: s
Out[43]: 
0   2013-01-01 09:12:56.123400-05:00
1   2013-01-02 09:12:56.123400-05:00
2   2013-01-03 09:12:56.123400-05:00
Length: 3, dtype: datetime64[ns, US/Eastern]

In [44]: s.dt.round('D')
Out[44]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
Length: 3, dtype: datetime64[ns, US/Eastern]

FloatIndex中整数的格式设置#

Integers in FloatIndex, e.g. 1., are now formatted with a decimal point and a 0 digit, e.g. 1.0 (GH11713) This change not only affects the display to the console, but also the output of IO methods like .to_csv or .to_html.

以前的行为：

In [2]: s = pd.Series([1, 2, 3], index=np.arange(3.))

In [3]: s
Out[3]:
0    1
1    2
2    3
dtype: int64

In [4]: s.index
Out[4]: Float64Index([0.0, 1.0, 2.0], dtype='float64')

In [5]: print(s.to_csv(path=None))
0,1
1,2
2,3

新行为：

In [45]: s = pd.Series([1, 2, 3], index=np.arange(3.))

In [46]: s
Out[46]: 
0.0    1
1.0    2
2.0    3
Length: 3, dtype: int64

In [47]: s.index
Out[47]: Float64Index([0.0, 1.0, 2.0], dtype='float64')

In [48]: print(s.to_csv(path_or_buf=None, header=False))
0.0,1
1.0,2
2.0,3

对数据类型赋值行为的更改#

当使用相同数据类型的新切片更新DataFrame的切片时，DataFrame的数据类型现在将保持不变。 (GH10503 )

以前的行为：

In [5]: df = pd.DataFrame({'a': [0, 1, 1],
                           'b': pd.Series([100, 200, 300], dtype='uint32')})

In [7]: df.dtypes
Out[7]:
a     int64
b    uint32
dtype: object

In [8]: ix = df['a'] == 1

In [9]: df.loc[ix, 'b'] = df.loc[ix, 'b']

In [11]: df.dtypes
Out[11]:
a    int64
b    int64
dtype: object

新行为：

In [49]: df = pd.DataFrame({'a': [0, 1, 1],
   ....:                    'b': pd.Series([100, 200, 300], dtype='uint32')})
   ....: 

In [50]: df.dtypes
Out[50]: 
a     int64
b    uint32
Length: 2, dtype: object

In [51]: ix = df['a'] == 1

In [52]: df.loc[ix, 'b'] = df.loc[ix, 'b']

In [53]: df.dtypes
Out[53]: 
a     int64
b    uint32
Length: 2, dtype: object

当DataFrame的整数切片被部分更新为新的浮点数切片时，可能会在不损失精度的情况下将其向下强制转换为整数，则切片的数据类型将被设置为浮点型而不是整型。

以前的行为：

In [4]: df = pd.DataFrame(np.array(range(1,10)).reshape(3,3),
                          columns=list('abc'),
                          index=[[4,4,8], [8,10,12]])

In [5]: df
Out[5]:
      a  b  c
4 8   1  2  3
  10  4  5  6
8 12  7  8  9

In [7]: df.ix[4, 'c'] = np.array([0., 1.])

In [8]: df
Out[8]:
      a  b  c
4 8   1  2  0
  10  4  5  1
8 12  7  8  9

新行为：

In [54]: df = pd.DataFrame(np.array(range(1,10)).reshape(3,3),
   ....:                   columns=list('abc'),
   ....:                   index=[[4,4,8], [8,10,12]])
   ....: 

In [55]: df
Out[55]: 
      a  b  c
4 8   1  2  3
  10  4  5  6
8 12  7  8  9

[3 rows x 3 columns]

In [56]: df.loc[4, 'c'] = np.array([0., 1.])

In [57]: df
Out[57]: 
      a  b  c
4 8   1  2  0
  10  4  5  1
8 12  7  8  9

[3 rows x 3 columns]

方法TO_XARRAY#

在未来的Pandas版本中，我们将不建议 Panel 和其他大于2ndim的对象。为了提供连续性，所有 NDFrame 对象已经获得了 .to_xarray() 方法，以便转换为 xarray 对象，它有一个像Pandas一样的界面，支持大于2 ndim的。 (GH11972 )

请参阅 xarray full-documentation here 。

In [1]: p = Panel(np.arange(2*3*4).reshape(2,3,4))

In [2]: p.to_xarray()
Out[2]:
<xarray.DataArray (items: 2, major_axis: 3, minor_axis: 4)>
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
Coordinates:
  * items       (items) int64 0 1
  * major_axis  (major_axis) int64 0 1 2
  * minor_axis  (minor_axis) int64 0 1 2 3

Latex 表示法#

DataFrame 已经获得了 ._repr_latex_() 方法，以便允许使用nbConvert在IPython/jupyter笔记本中转换为LaTeX。 (GH11778 )

Note that this must be activated by setting the option pd.display.latex.repr=True (GH12182)

例如，如果您有一台jupyter笔记本，您计划使用nbConvert将其转换为LaTeX，则将以下语句 pd.display.latex.repr=True 在第一个单元格中，包含的DataFrame输出也存储为LaTeX。

选项 display.latex.escape 和 display.latex.longtable 也已添加到配置中，并由 to_latex 方法。请参阅 available options docs 了解更多信息。

`pd.read_sas()` 变化#

read_sas 已获得读取SAS7BDAT文件的能力，包括压缩文件。这些文件可以全部读取，也可以增量读取。有关详细信息，请参阅 here 。 (GH4052 )

其他增强功能#

处理SAS导出文件中的截断浮点数 (GH11713 )
Added option to hide index in Series.to_string (GH11729)
read_excel now supports s3 urls of the format s3://bucketname/filename (GH11447)
添加对以下各项的支持 AWS_S3_HOST 从S3读取时的环境变量 (GH12198 )
一个简单版本的 Panel.round() 现已实施 (GH11763 )
对于Python3.x， round(DataFrame) ， round(Series) ， round(Panel) 将会奏效 (GH11763 )
sys.getsizeof(obj) 返回Pandas对象的内存使用情况，包括它包含的值 (GH11597 )
Series 获得了一个 is_unique 属性 (GH11946 )
DataFrame.quantile 和 Series.quantile 现在接受 interpolation 关键字 (GH10174 )。
已添加 DataFrame.style.format 更灵活地设置单元格值的格式 (GH11692 )
DataFrame.select_dtypes 现在允许 np.float16 类型码 (GH11990 )
pivot_table() 现在接受大多数可迭代的 values 参数 (GH12017 )
添加了谷歌 BigQuery service account authentication support, which enables authentication on remote servers. (GH11881, GH12572). For further details see here
HDFStore is now iterable: for k in store is equivalent to for k in store.keys() (GH12221).
Add missing methods/fields to .dt for Period (GH8848)
整个代码库已经 PEP -实例化 (GH12096 )

向后不兼容的API更改#

的输出中删除了前导空格 .to_string(index=False) 方法 (GH11833 )
这个 out 参数已从 Series.round() 方法。 (GH11763 )
DataFrame.round() 使非数字列在其返回中保持不变，而不是提升。 (GH11885 )
DataFrame.head(0) 和 DataFrame.tail(0) 返回空帧，而不是 self 。 (GH11937 )
Series.head(0) 和 Series.tail(0) 返回空序列，而不是 self 。 (GH11937 )
to_msgpack 和 read_msgpack 编码现在默认为 'utf-8' 。 (GH12170 )
文本文件解析函数的关键字参数顺序 (.read_csv() ， .read_table() ， .read_fwf() )更改为分组相关参数。 (GH11555 )
NaTType.isoformat 现在返回字符串 'NaT 以允许将结果传递给 Timestamp 。 (GH12300 )

NAT和Timedelta操作#

NaT and Timedelta have expanded arithmetic operations, which are extended to Series arithmetic where applicable. Operations defined for datetime64[ns] or timedelta64[ns] are now also defined for NaT (GH11564).

NaT 现在支持整数和浮点数的算术运算。

In [58]: pd.NaT * 1
Out[58]: NaT

In [59]: pd.NaT * 1.5
Out[59]: NaT

In [60]: pd.NaT / 2
Out[60]: NaT

In [61]: pd.NaT * np.nan
Out[61]: NaT

NaT 定义了更多的算术运算 datetime64[ns] 和 timedelta64[ns] 。

In [62]: pd.NaT / pd.NaT
Out[62]: nan

In [63]: pd.Timedelta('1s') / pd.NaT
Out[63]: nan

NaT 可以表示一个 datetime64[ns] Null或a timedelta64[ns] 空。考虑到模棱两可，它被视为 timedelta64[ns] ，这允许更多的操作成功。

In [64]: pd.NaT + pd.NaT
Out[64]: NaT

# same as
In [65]: pd.Timedelta('1s') + pd.Timedelta('1s')
Out[65]: Timedelta('0 days 00:00:02')

与之相对的是

In [3]: pd.Timestamp('19900315') + pd.Timestamp('19900315')
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'

但是，当包装在 Series 谁的 dtype 是 datetime64[ns] 或 timedelta64[ns] ，即 dtype 信息是受尊重的。

In [1]: pd.Series([pd.NaT], dtype='<M8[ns]') + pd.Series([pd.NaT], dtype='<M8[ns]')
TypeError: can only operate on a datetimes for subtraction,
           but the operator [__add__] was passed

In [66]: pd.Series([pd.NaT], dtype='<m8[ns]') + pd.Series([pd.NaT], dtype='<m8[ns]')
Out[66]: 
0   NaT
Length: 1, dtype: timedelta64[ns]

Timedelta 除以 floats 现在起作用了。

In [67]: pd.Timedelta('1s') / 2.0
Out[67]: Timedelta('0 days 00:00:00.500000')

减法 Timedelta 在一个 Series 由. Timestamp 作品 (GH11925 )

In [68]: ser = pd.Series(pd.timedelta_range('1 day', periods=3))

In [69]: ser
Out[69]: 
0   1 days
1   2 days
2   3 days
Length: 3, dtype: timedelta64[ns]

In [70]: pd.Timestamp('2012-01-01') - ser
Out[70]: 
0   2011-12-31
1   2011-12-30
2   2011-12-29
Length: 3, dtype: datetime64[ns]

NaT.isoformat() 现在返回 'NaT' 。这一变化允许 pd.Timestamp 从其等格式恢复任何类似时间戳的对象的消重数据 (GH12300 )。

Msgpack的更改#

转发中不兼容的更改 msgpack 写入格式超过0.17.0和0.18.0；较旧版本的PANDA无法读取由较新版本打包的文件 (GH12129 ， GH10527 )

虫子进来了 to_msgpack 和 read_msgpack 在0.17.0中引入并在0.18.0中修复，导致打包在Python2中的文件无法被Python3读取 (GH12142 )。下表介绍了msgpack的向后和向前比较。

警告

装满了	可以用来打开包装
0.17之前版本/Python2	任何
0.17之前的版本/Python3	任何
0.17/Python2	==0.17/Python2 >=0.18/Any Python
0.17/Python3	>=0.18/Any Python
0.18	>=0.18

0.18.0对于读取旧版本打包的文件是向后兼容的，但在Python2中打包为0.17的文件除外，在这种情况下，只有它们才能在Python2中解压。

.ran.的签名更改#

Series.rank 和 DataFrame.rank 现在都有相同的签名 (GH11759 )

上一次签名

In [3]: pd.Series([0,1]).rank(method='average', na_option='keep',
                              ascending=True, pct=False)
Out[3]:
0    1
1    2
dtype: float64

In [4]: pd.DataFrame([0,1]).rank(axis=0, numeric_only=None,
                                 method='average', na_option='keep',
                                 ascending=True, pct=False)
Out[4]:
   0
0  1
1  2

新签名

In [71]: pd.Series([0,1]).rank(axis=0, method='average', numeric_only=False,
   ....:                       na_option='keep', ascending=True, pct=False)
   ....: 
Out[71]: 
0    1.0
1    2.0
Length: 2, dtype: float64

In [72]: pd.DataFrame([0,1]).rank(axis=0, method='average', numeric_only=False,
   ....:                          na_option='keep', ascending=True, pct=False)
   ....: 
Out[72]: 
     0
0  1.0
1  2.0

[2 rows x 1 columns]

N=0的季度开始时出现错误#

在以前的版本中，季度起始偏移量的行为不一致，具体取决于 n 参数为0。 (GH11406 )

的锚定偏移的一般语义 n=0 当它是锚点时不移动日期(例如，四分之一开始日期)，否则前滚到下一个锚点。

In [73]: d = pd.Timestamp('2014-02-01')

In [74]: d
Out[74]: Timestamp('2014-02-01 00:00:00')

In [75]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
Out[75]: Timestamp('2014-02-01 00:00:00')

In [76]: d + pd.offsets.QuarterBegin(n=0, startingMonth=1)
Out[76]: Timestamp('2014-04-01 00:00:00')

对于 QuarterBegin 偏移量在以前的版本中，日期将滚动向后如果日期与季度开始日期在同一月份。

In [3]: d = pd.Timestamp('2014-02-15')

In [4]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
Out[4]: Timestamp('2014-02-01 00:00:00')

此行为已在版本0.18.0中得到更正，这与其他定位偏移一致，如 MonthBegin 和 YearBegin 。

In [77]: d = pd.Timestamp('2014-02-15')

In [78]: d + pd.offsets.QuarterBegin(n=0, startingMonth=2)
Out[78]: Timestamp('2014-05-01 00:00:00')

重采样API#

就像窗口函数API中的更改 above ， .resample(...) 正在改变，以拥有一个更像Groupby的API。 (GH11732 ， GH12702 ， GH12202 ， GH12332 ， GH12334 ， GH12348 ， GH12448 )。

In [79]: np.random.seed(1234)

In [80]: df = pd.DataFrame(np.random.rand(10,4),
   ....:                   columns=list('ABCD'),
   ....:                   index=pd.date_range('2010-01-01 09:00:00',
   ....:                                       periods=10, freq='s'))
   ....: 

In [81]: df
Out[81]: 
                            A         B         C         D
2010-01-01 09:00:00  0.191519  0.622109  0.437728  0.785359
2010-01-01 09:00:01  0.779976  0.272593  0.276464  0.801872
2010-01-01 09:00:02  0.958139  0.875933  0.357817  0.500995
2010-01-01 09:00:03  0.683463  0.712702  0.370251  0.561196
2010-01-01 09:00:04  0.503083  0.013768  0.772827  0.882641
2010-01-01 09:00:05  0.364886  0.615396  0.075381  0.368824
2010-01-01 09:00:06  0.933140  0.651378  0.397203  0.788730
2010-01-01 09:00:07  0.316836  0.568099  0.869127  0.436173
2010-01-01 09:00:08  0.802148  0.143767  0.704261  0.704581
2010-01-01 09:00:09  0.218792  0.924868  0.442141  0.909316

[10 rows x 4 columns]

以前的API ：

您可以编写一个立即求值的重采样操作。如果一个 how 参数，则默认为 how='mean' 。

In [6]: df.resample('2s')
Out[6]:
                         A         B         C         D
2010-01-01 09:00:00  0.485748  0.447351  0.357096  0.793615
2010-01-01 09:00:02  0.820801  0.794317  0.364034  0.531096
2010-01-01 09:00:04  0.433985  0.314582  0.424104  0.625733
2010-01-01 09:00:06  0.624988  0.609738  0.633165  0.612452
2010-01-01 09:00:08  0.510470  0.534317  0.573201  0.806949

您还可以指定一个 how 直接直接

In [7]: df.resample('2s', how='sum')
Out[7]:
                         A         B         C         D
2010-01-01 09:00:00  0.971495  0.894701  0.714192  1.587231
2010-01-01 09:00:02  1.641602  1.588635  0.728068  1.062191
2010-01-01 09:00:04  0.867969  0.629165  0.848208  1.251465
2010-01-01 09:00:06  1.249976  1.219477  1.266330  1.224904
2010-01-01 09:00:08  1.020940  1.068634  1.146402  1.613897

新的API ：

现在，你可以写下 .resample(..) 作为两个阶段的操作，如 .groupby(...) ，这会产生一个 Resampler 。

In [82]: r = df.resample('2s')

In [83]: r
Out[83]: <pandas.core.resample.DatetimeIndexResampler object at 0x7f0df9abcc70>

下采样#

然后，您可以使用此对象执行操作。这些是下采样操作(从较高频率到较低频率)。

In [84]: r.mean()
Out[84]: 
                            A         B         C         D
2010-01-01 09:00:00  0.485748  0.447351  0.357096  0.793615
2010-01-01 09:00:02  0.820801  0.794317  0.364034  0.531096
2010-01-01 09:00:04  0.433985  0.314582  0.424104  0.625733
2010-01-01 09:00:06  0.624988  0.609738  0.633165  0.612452
2010-01-01 09:00:08  0.510470  0.534317  0.573201  0.806949

[5 rows x 4 columns]

In [85]: r.sum()
Out[85]: 
                            A         B         C         D
2010-01-01 09:00:00  0.971495  0.894701  0.714192  1.587231
2010-01-01 09:00:02  1.641602  1.588635  0.728068  1.062191
2010-01-01 09:00:04  0.867969  0.629165  0.848208  1.251465
2010-01-01 09:00:06  1.249976  1.219477  1.266330  1.224904
2010-01-01 09:00:08  1.020940  1.068634  1.146402  1.613897

[5 rows x 4 columns]

此外，重新采样现在支持 getitem 对特定列执行重新采样的操作。

In [86]: r[['A','C']].mean()
Out[86]: 
                            A         C
2010-01-01 09:00:00  0.485748  0.357096
2010-01-01 09:00:02  0.820801  0.364034
2010-01-01 09:00:04  0.433985  0.424104
2010-01-01 09:00:06  0.624988  0.633165
2010-01-01 09:00:08  0.510470  0.573201

[5 rows x 2 columns]

和 .aggregate 键入操作。

In [87]: r.agg({'A' : 'mean', 'B' : 'sum'})
Out[87]: 
                            A         B
2010-01-01 09:00:00  0.485748  0.894701
2010-01-01 09:00:02  0.820801  1.588635
2010-01-01 09:00:04  0.433985  0.629165
2010-01-01 09:00:06  0.624988  1.219477
2010-01-01 09:00:08  0.510470  1.068634

[5 rows x 2 columns]

当然，这些访问器可以组合在一起

In [88]: r[['A','B']].agg(['mean','sum'])
Out[88]: 
                            A                   B          
                         mean       sum      mean       sum
2010-01-01 09:00:00  0.485748  0.971495  0.447351  0.894701
2010-01-01 09:00:02  0.820801  1.641602  0.794317  1.588635
2010-01-01 09:00:04  0.433985  0.867969  0.314582  0.629165
2010-01-01 09:00:06  0.624988  1.249976  0.609738  1.219477
2010-01-01 09:00:08  0.510470  1.020940  0.534317  1.068634

[5 rows x 4 columns]

上采样#

上采样操作将您从较低的频率带到较高的频率。现在，这些操作通过 Resampler 对象具有 backfill() ， ffill() ， fillna() 和 asfreq() 方法。

In [89]: s = pd.Series(np.arange(5, dtype='int64'),
   ....:               index=pd.date_range('2010-01-01', periods=5, freq='Q'))
   ....: 

In [90]: s
Out[90]: 
2010-03-31    0
2010-06-30    1
2010-09-30    2
2010-12-31    3
2011-03-31    4
Freq: Q-DEC, Length: 5, dtype: int64

先前

In [6]: s.resample('M', fill_method='ffill')
Out[6]:
2010-03-31    0
2010-04-30    0
2010-05-31    0
2010-06-30    1
2010-07-31    1
2010-08-31    1
2010-09-30    2
2010-10-31    2
2010-11-30    2
2010-12-31    3
2011-01-31    3
2011-02-28    3
2011-03-31    4
Freq: M, dtype: int64

新的API

In [91]: s.resample('M').ffill()
Out[91]: 
2010-03-31    0
2010-04-30    0
2010-05-31    0
2010-06-30    1
2010-07-31    1
2010-08-31    1
2010-09-30    2
2010-10-31    2
2010-11-30    2
2010-12-31    3
2011-01-31    3
2011-02-28    3
2011-03-31    4
Freq: M, Length: 13, dtype: int64

备注

在新的API中，您可以使用DownSample或UpSample。前面的实现将允许您传递聚合器函数(如 mean )，尽管您是在向上采样，这会造成一些混淆。

以前的API可以使用，但不推荐使用#

警告

这个用于重采样的新API包括对0.18.0之前的API进行的一些内部更改，以便在大多数情况下使用不推荐使用的警告，因为重采样操作返回延迟对象。我们可以拦截操作，只需像(0.18.0之前的)API所做的那样(带有警告)。以下是一个典型的用例：

In [4]: r = df.resample('2s')

In [6]: r*10
pandas/tseries/resample.py:80: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)

Out[6]:
                      A         B         C         D
2010-01-01 09:00:00  4.857476  4.473507  3.570960  7.936154
2010-01-01 09:00:02  8.208011  7.943173  3.640340  5.310957
2010-01-01 09:00:04  4.339846  3.145823  4.241039  6.257326
2010-01-01 09:00:06  6.249881  6.097384  6.331650  6.124518
2010-01-01 09:00:08  5.104699  5.343172  5.732009  8.069486

但是，直接对 Resampler 将引发一个 ValueError ：

In [7]: r.iloc[0] = 5
ValueError: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)

在使用原代码时，会出现新API无法执行所有操作的情况。此代码打算每隔2秒重新采样一次， mean 然后拿着 min 这些结果。

In [4]: df.resample('2s').min()
Out[4]:
A    0.433985
B    0.314582
C    0.357096
D    0.531096
dtype: float64

新的空气污染指数将包括：

In [92]: df.resample('2s').min()
Out[92]: 
                            A         B         C         D
2010-01-01 09:00:00  0.191519  0.272593  0.276464  0.785359
2010-01-01 09:00:02  0.683463  0.712702  0.357817  0.500995
2010-01-01 09:00:04  0.364886  0.013768  0.075381  0.368824
2010-01-01 09:00:06  0.316836  0.568099  0.397203  0.436173
2010-01-01 09:00:08  0.218792  0.143767  0.442141  0.704581

[5 rows x 4 columns]

好消息是，新API和旧API的返回维度将不同，因此这应该会引发一个明显的异常。

复制原始操作的步骤

In [93]: df.resample('2s').mean().min()
Out[93]: 
A    0.433985
B    0.314582
C    0.357096
D    0.531096
Length: 4, dtype: float64

对EVAL的更改#

在以前的版本中， eval 表达式导致对 DataFrame 。 (GH9297 ， GH8664 ， GH10486 )

In [94]: df = pd.DataFrame({'a': np.linspace(0, 10, 5), 'b': range(5)})

In [95]: df
Out[95]: 
      a  b
0   0.0  0
1   2.5  1
2   5.0  2
3   7.5  3
4  10.0  4

[5 rows x 2 columns]

In [12]: df.eval('c = a + b')
FutureWarning: eval expressions containing an assignment currentlydefault to operating inplace.
This will change in a future version of pandas, use inplace=True to avoid this warning.

In [13]: df
Out[13]:
      a  b     c
0   0.0  0   0.0
1   2.5  1   3.5
2   5.0  2   7.0
3   7.5  3  10.5
4  10.0  4  14.0

在0.18.0版中，一个新的 inplace 添加了关键字以选择是就地完成分配还是返回副本。

In [96]: df
Out[96]: 
      a  b     c
0   0.0  0   0.0
1   2.5  1   3.5
2   5.0  2   7.0
3   7.5  3  10.5
4  10.0  4  14.0

[5 rows x 3 columns]

In [97]: df.eval('d = c - b', inplace=False)
Out[97]: 
      a  b     c     d
0   0.0  0   0.0   0.0
1   2.5  1   3.5   2.5
2   5.0  2   7.0   5.0
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[5 rows x 4 columns]

In [98]: df
Out[98]: 
      a  b     c
0   0.0  0   0.0
1   2.5  1   3.5
2   5.0  2   7.0
3   7.5  3  10.5
4  10.0  4  14.0

[5 rows x 3 columns]

In [99]: df.eval('d = c - b', inplace=True)

In [100]: df
Out[100]: 
      a  b     c     d
0   0.0  0   0.0   0.0
1   2.5  1   3.5   2.5
2   5.0  2   7.0   5.0
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[5 rows x 4 columns]

警告

为了向后兼容， inplace 默认为 True 如果未指定，则为。在未来的Pandas版本中，这一点将会改变。如果您的代码依赖于就地赋值，则应更新为显式设置 inplace=True

这个 inplace 关键字参数还添加到 query 方法。

In [101]: df.query('a > 5')
Out[101]: 
      a  b     c     d
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[2 rows x 4 columns]

In [102]: df.query('a > 5', inplace=True)

In [103]: df
Out[103]: 
      a  b     c     d
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[2 rows x 4 columns]

警告

请注意，的默认值为 inplace 在一个 query 是 False ，这与以前的版本一致。

eval 也已更新，允许对多个赋值使用多行表达式。这些表达式将按顺序逐个求值。只有赋值对多行表达式有效。

In [104]: df
Out[104]: 
      a  b     c     d
3   7.5  3  10.5   7.5
4  10.0  4  14.0  10.0

[2 rows x 4 columns]

In [105]: df.eval("""
   .....: e = d + a
   .....: f = e - 22
   .....: g = f / 2.0""", inplace=True)
   .....: 

In [106]: df
Out[106]: 
      a  b     c     d     e    f    g
3   7.5  3  10.5   7.5  15.0 -7.0 -3.5
4  10.0  4  14.0  10.0  20.0 -2.0 -1.0

[2 rows x 7 columns]

其他API更改#

DataFrame.between_time 和 Series.between_time 现在只解析一组固定的时间字符串。不再支持对日期字符串的分析，并引发 ValueError 。 (GH11818 )

In [107]: s = pd.Series(range(10), pd.date_range('2015-01-01', freq='H', periods=10))

In [108]: s.between_time("7:00am", "9:00am")
Out[108]: 
2015-01-01 07:00:00    7
2015-01-01 08:00:00    8
2015-01-01 09:00:00    9
Freq: H, Length: 3, dtype: int64

现在这个问题将被提出来。

In [2]: s.between_time('20150101 07:00:00','20150101 09:00:00')
ValueError: Cannot convert arg ['20150101 07:00:00'] to a time.

.memory_usage() now includes values in the index, as does memory_usage in .info() (GH11597)
DataFrame.to_latex() now supports non-ascii encodings (eg utf-8) in Python 2 with the parameter encoding (GH7061)
pandas.merge() 和 DataFrame.merge() 在尝试与非类型的对象合并时，将显示特定错误消息 DataFrame 或者是一个子类 (GH12081 )
DataFrame.unstack 和 Series.unstack 现在拿着 fill_value 关键字，以便在取消堆栈导致结果中缺少值时直接替换缺少的值 DataFrame 。作为一个额外的好处，指定 fill_value 将保留原始堆叠数据的数据类型。 (GH9746 )
作为新的API的一部分 window functions 和 resampling ，聚合函数已得到澄清，引发了有关无效聚合的更多信息性错误消息。 (GH9052 )。中给出了完整的示例 groupby 。
Statistical functions for NDFrame objects (like sum(), mean(), min()) will now raise if non-numpy-compatible arguments are passed in for **kwargs (GH12301)
.to_latex and .to_html gain a decimal parameter like .to_csv; the default is '.' (GH12031)
在构造 DataFrame 具有空数据但具有索引 (GH8020 )
.describe() 现在将正确地将bool dtype作为 (GH6625 )
使用无效的错误消息更有帮助 .transform 具有用户定义的输入 (GH10165 )
Exponentially weighted functions now allow specifying alpha directly (GH10789) and raise ValueError if parameters violate 0 < alpha <= 1 (GH12492)

不推荐使用#

功能 pd.rolling_* ， pd.expanding_* ，以及 pd.ewm* 已弃用，并被相应的方法调用所取代。请注意，新建议的语法包括所有参数(即使是默认的) (GH11603 )

In [1]: s = pd.Series(range(3))

In [2]: pd.rolling_mean(s,window=2,min_periods=1)
        FutureWarning: pd.rolling_mean is deprecated for Series and
             will be removed in a future version, replace with
             Series.rolling(min_periods=1,window=2,center=False).mean()
Out[2]:
        0    0.0
        1    0.5
        2    1.5
        dtype: float64

In [3]: pd.rolling_cov(s, s, window=2)
        FutureWarning: pd.rolling_cov is deprecated for Series and
             will be removed in a future version, replace with
             Series.rolling(window=2).cov(other=<Series>)
Out[3]:
        0    NaN
        1    0.5
        2    0.5
        dtype: float64

这个 freq 和 how 的参数 .rolling ， .expanding ，以及 .ewm (新增)函数已弃用，并将在未来版本中删除。您只需在创建窗口函数之前对输入重新采样即可。 (GH11603 )。

例如，与其不同的是 s.rolling(window=5,freq='D').max() 要在滚动的5天窗口中获得最大值，可以使用 s.resample('D').mean().rolling(window=5).max() ，它首先将数据重采样为每日数据，然后提供滚动的5天窗口。
pd.tseries.frequencies.get_offset_name 函数已弃用。使用偏移量 .freqstr 属性作为替代方案 (GH11192 )
pandas.stats.fama_macbeth 例程已弃用，并将在将来的版本中删除 (GH6077 )
pandas.stats.ols ， pandas.stats.plm 和 pandas.stats.var 例程已弃用，并将在将来的版本中删除 (GH6077 )
显示为 FutureWarning 而不是 DeprecationWarning 中使用长期不推荐使用的语法 HDFStore.select ，其中 where 子句不是类似字符串的 (GH12027 )
The pandas.options.display.mpl_style configuration has been deprecated and will be removed in a future version of pandas. This functionality is better handled by matplotlib's style sheets (GH11783).

删除不推荐使用的浮点索引器#

在……里面 GH4892 在非``Float64Index``上使用浮点数的索引已弃用(在0.14.0版中)。在0.18.0中，此弃用警告已删除，现在将引发 TypeError 。 (GH12165 ， GH12333 )

In [109]: s = pd.Series([1, 2, 3], index=[4, 5, 6])

In [110]: s
Out[110]: 
4    1
5    2
6    3
Length: 3, dtype: int64

In [111]: s2 = pd.Series([1, 2, 3], index=list('abc'))

In [112]: s2
Out[112]: 
a    1
b    2
c    3
Length: 3, dtype: int64

以前的行为：

# this is label indexing
In [2]: s[5.0]
FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
Out[2]: 2

# this is positional indexing
In [3]: s.iloc[1.0]
FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
Out[3]: 2

# this is label indexing
In [4]: s.loc[5.0]
FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
Out[4]: 2

# .ix would coerce 1.0 to the positional 1, and index
In [5]: s2.ix[1.0] = 10
FutureWarning: scalar indexers for index type Index should be integers and not floating point

In [6]: s2
Out[6]:
a     1
b    10
c     3
dtype: int64

新行为：

对于iloc，通过浮点标量获取和设置将始终引发。

In [3]: s.iloc[2.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.numeric.Int64Index'> with these indexers [2.0] of <type 'float'>

对于获取和设置，其他索引器将强制为类似的整数。这个 FutureWarning 已因以下原因而被取消 .loc ， .ix 和 [] 。

In [113]: s[5.0]
Out[113]: 2

In [114]: s.loc[5.0]
Out[114]: 2

和设置

In [115]: s_copy = s.copy()

In [116]: s_copy[5.0] = 10

In [117]: s_copy
Out[117]: 
4     1
5    10
6     3
Length: 3, dtype: int64

In [118]: s_copy = s.copy()

In [119]: s_copy.loc[5.0] = 10

In [120]: s_copy
Out[120]: 
4     1
5    10
6     3
Length: 3, dtype: int64

位置设置，使用 .ix 浮点索引器会将该值添加到索引中，而不是以前按位置设置值。

In [3]: s2.ix[1.0] = 10
In [4]: s2
Out[4]:
a       1
b       2
c       3
1.0    10
dtype: int64

切片还将强制将非``Float64Index``的类整型浮点数转换为整数。

In [121]: s.loc[5.0:6]
Out[121]: 
5    2
6    3
Length: 2, dtype: int64

请注意，对于不强制整型的浮点数，基于标签的边界将被排除

In [122]: s.loc[5.1:6]
Out[122]: 
6    3
Length: 1, dtype: int64

对象上的浮点索引 Float64Index 是不变的。

In [123]: s = pd.Series([1, 2, 3], index=np.arange(3.))

In [124]: s[1.0]
Out[124]: 2

In [125]: s[1.0:2.5]
Out[125]: 
1.0    2
2.0    3
Length: 2, dtype: int64

删除先前版本的弃用/更改#

Removal of rolling_corr_pairwise in favor of .rolling().corr(pairwise=True) (GH4950)
Removal of expanding_corr_pairwise in favor of .expanding().corr(pairwise=True) (GH4950)
删除 DataMatrix 模块。在任何情况下都不会将其导入到Pandas命名空间 (GH12111 )
Removal of cols keyword in favor of subset in DataFrame.duplicated() and DataFrame.drop_duplicates() (GH6680)
移除 read_frame 和 frame_query (两个别名都是 pd.read_sql )和 write_frame (别名为 to_sql )中的 pd.io.sql 命名空间，自0.14.0起不推荐使用 (GH6292 )。
Removal of the order keyword from .factorize() (GH6930)

性能改进#

Improved performance of andrews_curves (GH11534)
Improved huge DatetimeIndex, PeriodIndex and TimedeltaIndex's ops performance including NaT (GH10277)
Improved performance of pandas.concat (GH11958)
Improved performance of StataReader (GH11591)
Improved performance in construction of Categoricals with Series of datetimes containing NaT (GH12077)
改进了不带分隔符的ISO 8601日期解析的性能 (GH11899 )，前导零 (GH11871 )，并在时区之前留有空格 (GH9714 )

错误修复#

窃听 GroupBy.size 当数据帧为空时。 (GH11699 )
窃听 Period.end_time 当请求多个时间段时 (GH11738 )
回归到 .clip 使用TZ感知的DateTime (GH11838 )
窃听 date_range 当边界落在频率上时 (GH11804 ， GH12409 )
Bug in consistency of passing nested dicts to .groupby(...).agg(...) (GH9052)
在以下位置接受Unicode Timedelta 构造函数 (GH11995 )
读取值标签时出现错误 StataReader 当递增阅读时 (GH12014 )
Bug in vectorized DateOffset when n parameter is 0 (GH11370)
与NumPy 1.11 w.r.t. NaT 比较变化 (GH12049 )
窃听 read_csv 当从一个 StringIO 在线程中 (GH11790 )
Bug in not treating NaT as a missing value in datetimelikes when factorizing & with Categoricals (GH12077)
的值时，getitem中的错误 Series 我们意识到了TZ (GH12089 )
窃听 Series.str.get_dummies 当其中一个变量为‘name’时 (GH12180 )
窃听 pd.concat 同时串联支持TZ的NAT系列。 (GH11693 ， GH11755 ， GH12217 )
窃听 pd.read_stata 版本小于等于108个文件 (GH12232 )
窃听 Series.resample 使用频率为 Nano 当索引为 DatetimeIndex 并包含非零纳秒部件 (GH12037 )
重采样中的错误 .nunique 和稀疏索引 (GH12352 )
删除了一些编译器警告 (GH12471 )
使用解决公司问题 boto 在Python3.5中 (GH11915 )
窃听 NaT 从…减去 Timestamp 或 DatetimeIndex 使用时区 (GH11718 )
Bug in subtraction of Series of a single tz-aware Timestamp (GH12290)
Use compat iterators in PY2 to support .next() (GH12299)
窃听 Timedelta.round 值为负值 (GH11690 )
Bug in .loc against CategoricalIndex may result in normal Index (GH11586)
窃听 DataFrame.info 当存在重复的列名时 (GH11761 )
窃听 .copy DateTime Tz感知对象的 (GH11794 )
窃听 Series.apply 和 Series.map 哪里 timedelta64 没有被装箱 (GH11349 )
Bug in DataFrame.set_index() with tz-aware Series (GH12358)
的子类中的错误 DataFrame 哪里 AttributeError 没有传播 (GH11808 )
Bug groupby on tz-aware data where selection not returning Timestamp (GH11616)
窃听 pd.read_clipboard 和 pd.to_clipboard 不支持Unicode的功能；包括升级 pyperclip 至v1.5.15 (GH9263 )
窃听 DataFrame.query 包含一个赋值 (GH8664 )
窃听 from_msgpack 哪里 __contains__() 未打包的列失败。 DataFrame ，如果 DataFrame 具有对象列。 (GH11880 )
Bug in .resample on categorical data with TimedeltaIndex (GH12169)
Bug in timezone info lost when broadcasting scalar datetime to DataFrame (GH11682)
窃听 Index 创建自 Timestamp 用混合的TZ胁迫到UTC (GH11488 )
窃听 to_numeric 其中，如果输入超过一个维度，则不会引发 (GH11776 )
解析非零分钟的时区偏移量字符串时出错 (GH11708 )
窃听 df.plot 在matplotlib 1.5+下对条形图使用错误的颜色 (GH11614 )
Bug in the groupby plot 在使用关键字参数时使用 (GH11805 )。
Bug in DataFrame.duplicated and drop_duplicates causing spurious matches when setting keep=False (GH11864)
窃听 .loc 具有重复键的结果可能具有 Index 数据类型不正确 (GH11497 )
窃听 pd.rolling_median 内存分配失败，即使有足够的内存 (GH11696 )
窃听 DataFrame.style 用虚假的零 (GH12134 )
窃听 DataFrame.style 整型列不是从0开始的 (GH12125 )
窃听 .style.bar 可能无法使用特定浏览器正确呈现 (GH11678 )
丰富的比较中的错误 Timedelta 使用一个 numpy.array 的 Timedelta 这导致了无限递归 (GH11835 )
窃听 DataFrame.round 正在删除列索引名称 (GH11986 )
Bug in df.replace while replacing value in mixed dtype Dataframe (GH11698)
窃听 Index 防止复制传递的名称 Index ，当未提供新名称时 (GH11193 )
Bug in read_excel failing to read any non-empty sheets when empty sheets exist and sheetname=None (GH11711)
窃听 read_excel 未能筹集到 NotImplemented 当关键字 parse_dates 和 date_parser 都提供了 (GH11544 )
窃听 read_sql 使用 pymysql 无法返回分块数据的连接 (GH11522 )
窃听 .to_csv 忽略格式化参数 decimal ， na_rep ， float_format 对于浮点索引 (GH11553 )
窃听 Int64Index 和 Float64Index 防止使用模运算符 (GH9244 )
窃听 MultiIndex.drop 对于未按词法排序的多索引 (GH12078 )
Bug in DataFrame when masking an empty DataFrame (GH11859)
窃听 .plot 可能会修改 colors 列数与提供的序列数不匹配时输入 (GH12039 )。
窃听 Series.plot 当索引具有 CustomBusinessDay 频率 (GH7222 )。
窃听 .to_sql 为 datetime.time 使用SQLite回退的值 (GH8341 )
Bug in read_excel failing to read data with one column when squeeze=True (GH12157)
窃听 read_excel 无法读取一个空列 (GH12292 ， GH9002 )
窃听 .groupby 其中一个 KeyError 如果数据帧中只有一行，则不会为错误的列引发 (GH11741 )
窃听 .read_csv 在空数据上指定数据类型会产生错误 (GH12048 )
窃听 .read_csv 其中的字符串如 '2E' 被视为有效的浮点数 (GH12237 )
建筑中的Bug Pandas 使用调试符号 (GH12123 )
Removed millisecond property of DatetimeIndex. This would always raise a ValueError (GH12019).
窃听 Series 具有只读数据的构造函数 (GH11502 )
已删除 pandas._testing.choice() 。应该使用 np.random.choice() ，取而代之的是。 (GH12386 )
窃听 .loc 阻止使用TZ感知的DatetimeIndex的组件索引器 (GH12050 )
窃听 .style 未显示索引和多索引 (GH11655 )
Bug in to_msgpack and from_msgpack which did not correctly serialize or deserialize NaT (GH12307).
窃听 .skew 和 .kurt 由于高度相似值的舍入误差 (GH11974 )
窃听 Timestamp 构造函数，如果HHMMS不用‘：’分隔，则会丢失微秒分辨率 (GH10041 )
窃听 buffer_rd_bytes 如果读取失败，可能会多次释放SRC->缓冲区，从而导致段错误 (GH12098 )
Bug in crosstab where arguments with non-overlapping indexes would return a KeyError (GH10291)
窃听 DataFrame.apply 在这种情况下，不会阻止减少 dtype 不是NumPy数据类型 (GH12244 )
使用标量值初始化分类序列时出现错误。 (GH12336 )
Bug when specifying a UTC DatetimeIndex by setting utc=True in .to_datetime (GH11934)
Bug when increasing the buffer size of CSV reader in read_csv (GH12494)
设置列时出现错误 DataFrame 具有重复的列名 (GH12344 )

贡献者#

共有101人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

ARF +
Alex Alekseyev +
Andrew McPherson +
Andrew Rosenfeld
Andy Hayden
Anthonios Partheniou
Anton I. Sipos
Ben +
Ben North +
Bran Yang +
Chris
Chris Carroux +
Christopher C. Aycock +
Christopher Scanlin +
Cody +
Da Wang +
Daniel Grady +
Dorozhko Anton +
Dr-Irv +
Erik M. Bray +
Evan Wright
Francis T. O'Donovan +
Frank Cleary +
Gianluca Rossi
Graham Jeffries +
Guillaume Horel
Henry Hammond +
Isaac Schwabacher +
Jean-Mathieu Deschenes
Jeff Reback
Joe Jevnik +
John Freeman +
John Fremlin +
Jonas Hoersch +
Joris Van den Bossche
Joris Vankerschaver
Justin Lecher
Justin Lin +
Ka Wo Chen
Keming Zhang +
Kerby Shedden
Kyle +
Marco Farrugia +
MasonGallo +
MattRijk +
Matthew Lurie +
Maximilian Roos
Mayank Asthana +
Mortada Mehyar
Moussa Taifi +
Navreet Gill +
Nicolas Bonnotte
Paul Reiners +
Philip Gura +
Pietro Battiston
RahulHP +
Randy Carnevale
Rinoc Johnson
Rishipuri +
Sangmin Park +
Scott E Lasley
Sereger13 +
Shannon Wang +
Skipper Seabold
Thierry Moisan
Thomas A Caswell
Toby Dylan Hocking +
Tom Augspurger
Travis +
Trent Hauck
Tux1
Varun
Wes McKinney
Will Thompson +
Yoav Ram
Yoong Kang Lim +
Yoshiki Vázquez Baeza
Young Joong Kim +
Younggun Kim
Yuval Langer +
alex argunov +
behzad nouri
boombard +
brian-pantano +
chromy +
daniel +
dgram0 +
gfyoung +
hack-c +
hcontrast +
jfoo +
kaustuv deolal +
llllllllll
ranarag +
rockg
scls19fr
seales +
sinhrks
srib +
surveymedia.ca +
tworec +

0.18.1版(2016年5月3日)

版本0.17.1(2015年11月21日)

0.18.0版(2016年3月13日)#

新功能#

窗口函数现在是方法#

更改为重命名#

范围索引#

对str.Extact的更改#

添加str.Extall#

对str.cat的更改#

类似日期的舍入#

FloatIndex中整数的格式设置#

对数据类型赋值行为的更改#

方法TO_XARRAY#

Latex 表示法#

pd.read_sas() 变化#

其他增强功能#

向后不兼容的API更改#

NAT和Timedelta操作#

Msgpack的更改#

.ran.的签名更改#

N=0的季度开始时出现错误#

重采样API#

下采样#

上采样#

以前的API可以使用，但不推荐使用#

对EVAL的更改#

其他API更改#

不推荐使用#

删除不推荐使用的浮点索引器#

删除先前版本的弃用/更改#

性能改进#

错误修复#

贡献者#

`pd.read_sas()` 变化#