版本0.13.0(2014年1月3日)#

这是从0.12.0开始的一个主要版本，包括许多API更改、几个新功能和增强功能以及大量的错误修复。

亮点包括：

支持新的索引类型 Float64Index 和其他索引增强功能
HDFStore 有一种新的基于字符串的查询规范语法
支持新的插补方法
已更新 timedelta 运营
一种新的字符串操作方法 extract
对偏移量的纳秒支持
isin 对于DataFrames

添加了几个试验性功能，包括：

新的 eval/query 表达式求值的方法
支持 msgpack 序列化
谷歌的I/O接口 BigQuery

它们是几个新的或更新的文档部分，包括：

Comparison with SQL ，这对那些熟悉SQL但仍在学习Pandas的人应该很有用。
Comparison with R ，从R到Pandas的习语翻译。
Enhancing Performance ，通过以下方式提高Pandas的表现 eval/query 。

警告

在0.13.0中 Series 已在内部重构为不再是子类 ndarray 取而代之的是子类 NDFrame ，与其他Pandas容器相似。这应该是一个透明的更改，只会对API造成非常有限的影响。看见 Internal Refactoring

API更改#

read_excel 现在支持在其 sheetname 提供要读入的工作表的索引的参数 (GH4301 )。
文本解析器现在可以处理任何类似inf(“inf”、“inf”、“-inf”、“inf”等)的内容。无穷无尽。 (GH4220 ， GH4219 )，影响 read_table ， read_csv 等。
pandas 多亏了@jtratner，现在可以兼容Python2/3，而不需要2to3。因此，Pandas现在更广泛地使用迭代器。这也导致了本杰明·彼得森的 six 将类库转换为计算机。 (GH4384 ， GH4375 ， GH4372 )
pandas.util.compat 和 pandas.util.py3compat 已经被合并到 pandas.compat 。 pandas.compat 现在包括许多允许2/3兼容性的功能。它包含Range、Filter、map和Zip的列表和迭代器版本，以及与Python3兼容的其他必要元素。 lmap ， lzip ， lrange 和 lfilter 它们都生成列表，而不是迭代器，以与 numpy 、下标和 pandas 构造函数。 (GH4384 ， GH4375 ， GH4372 )
Series.get with negative indexers now returns the same as [] (GH4390)

更改方式 Index 和 MultiIndex 处理元数据 (levels ， labels ，以及 names ) (GH4039 )：

# previously, you would have set levels or labels directly
>>> pd.index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]

# now, you use the set_levels or set_labels methods
>>> index = pd.index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])

# similarly, for names, you can rename the object
# but setting names is not deprecated
>>> index = pd.index.set_names(["bob", "cranberry"])

# and all methods take an inplace kwarg - but return None
>>> pd.index.set_names(["bob", "cranberry"], inplace=True)

All 除法与 NDFrame 对象现在是 真正的划分 ，而不考虑未来的进口。这意味着默认情况下，对Pandas对象的操作将使用 浮点数 除法，并返回浮点数据类型。您可以使用 // 和 floordiv 来做整数除法。

整数除法

In [3]: arr = np.array([1, 2, 3, 4])

In [4]: arr2 = np.array([5, 3, 2, 1])

In [5]: arr / arr2
Out[5]: array([0, 0, 1, 4])

In [6]: pd.Series(arr) // pd.Series(arr2)
Out[6]:
0    0
1    0
2    1
3    4
dtype: int64

真正的划分

In [7]: pd.Series(arr) / pd.Series(arr2)  # no future import required
Out[7]:
0    0.200000
1    0.666667
2    1.500000
3    4.000000
dtype: float64

Infer and downcast dtype if downcast='infer' is passed to fillna/ffill/bfill (GH4604)

__nonzero__ 对于所有NDFrame对象，现在将引发 ValueError ，这将恢复到 (GH1073 ， GH4633 )行为。看见 gotchas 以进行更详细的讨论。

这会阻止对全 Pandas对象，这在本质上是模棱两可的。所有这些都将引发一个 ValueError 。

>>> df = pd.DataFrame({'A': np.random.randn(10),
...                    'B': np.random.randn(10),
...                    'C': pd.date_range('20130101', periods=10)
...                    })
...
>>> if df:
...     pass
...
Traceback (most recent call last):
    ...
ValueError: The truth value of a DataFrame is ambiguous.  Use a.empty,
a.bool(), a.item(), a.any() or a.all().

>>> df1 = df
>>> df2 = df
>>> df1 and df2
Traceback (most recent call last):
    ...
ValueError: The truth value of a DataFrame is ambiguous.  Use a.empty,
a.bool(), a.item(), a.any() or a.all().

>>> d = [1, 2, 3]
>>> s1 = pd.Series(d)
>>> s2 = pd.Series(d)
>>> s1 and s2
Traceback (most recent call last):
    ...
ValueError: The truth value of a DataFrame is ambiguous.  Use a.empty,
a.bool(), a.item(), a.any() or a.all().

添加了 .bool() 方法来执行以下操作 NDFrame 对象以便于计算单元素布尔级数：

In [1]: pd.Series([True]).bool()
Out[1]: True

In [2]: pd.Series([False]).bool()
Out[2]: False

In [3]: pd.DataFrame([[True]]).bool()
Out[3]: True

In [4]: pd.DataFrame([[False]]).bool()
Out[4]: False

所有非索引NDFrame (Series ， DataFrame ， Panel ， Panel4D ， SparsePanel 等)，现在支持整套算术运算符和算术灵活方法(加、减、乘等)。 SparsePanel 不支持 pow 或 mod 使用非标量。 (GH3765 )
Series 和 DataFrame 现在有一个 mode() 按AXIS/Series计算统计模式的方法。 (GH5367 )

链式分配现在默认情况下会在用户分配给副本时发出警告。可以使用选项更改此设置 mode.chained_assignment ，允许的选项包括 raise/warn/None 。看见 the docs 。

In [5]: dfc = pd.DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]})

In [6]: pd.set_option('chained_assignment', 'warn')

如果尝试此操作，将显示以下警告/异常。

In [7]: dfc.loc[0]['A'] = 1111

Traceback (most recent call last)
   ...
SettingWithCopyWarning:
   A value is trying to be set on a copy of a slice from a DataFrame.
   Try using .loc[row_index,col_indexer] = value instead

以下是正确的作业方法。

In [8]: dfc.loc[0, 'A'] = 11

In [9]: dfc
Out[9]: 
     A  B
0   11  1
1  bbb  2
2  ccc  3

Panel.reindex has the following call signature Panel.reindex(items=None, major_axis=None, minor_axis=None, **kwargs)
为了符合其他标准 NDFrame 物体。看见 Internal Refactoring 了解更多信息。
Series.argmin 和 Series.argmax 现在别名为 Series.idxmin 和 Series.idxmax 。这些函数返回索引的
分别为最小元素或最大元素。在0.13.0之前，这些参数将返回最小/最大元素的位置。 (GH6214 )

先前版本的弃用/更改#

这些是在0.12或更早版本中宣布的更改，自0.13.0起生效

Remove deprecated Factor (GH3650)
Remove deprecated set_printoptions/reset_printoptions (GH3046)
Remove deprecated _verbose_info (GH3215)
Remove deprecated read_clipboard/to_clipboard/ExcelFile/ExcelWriter from pandas.io.parsers (GH3717) These are available as functions in the main pandas namespace (e.g. pd.read_clipboard)
默认设置为 tupleize_cols 现在是 False 对这两个人都是 to_csv 和 read_csv 。0.12中的公平警告 (GH3604 )
默认设置为 display.max_seq_len 现在是100，而不是 None 。这将激活截断显示(“...”)不同地方的长序列。 (GH3391 )

不推荐使用#

0.13.0中已弃用

弃用 iterkv ，它将在将来的版本中删除(这是用于绕过的迭代项的别名 2to3 的更改)。 (GH4384 ， GH4375 ， GH4372 )
不推荐使用字符串方法 match ，他的角色现在更地道地由 extract 。在未来版本中，默认行为是 match 将变得类似于 contains ，它返回布尔索引器。(他们的区别是严格： match 依赖于 re.match 而当 contains 依赖于 re.search 。)在此版本中，不推荐使用的行为是默认行为，但可以通过关键字参数使用新行为 as_indexer=True 。

索引API更改#

在0.13之前，无法使用标签索引器 (.loc/.ix )设置未包含在特定轴的索引中的值。 (GH2578 )。看见 the docs

在 Series 如果这实际上是一个追加操作

In [10]: s = pd.Series([1, 2, 3])

In [11]: s
Out[11]: 
0    1
1    2
2    3
dtype: int64

In [12]: s[5] = 5.

In [13]: s
Out[13]: 
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

In [14]: dfi = pd.DataFrame(np.arange(6).reshape(3, 2),
   ....:                    columns=['A', 'B'])
   ....: 

In [15]: dfi
Out[15]: 
   A  B
0  0  1
1  2  3
2  4  5

这将在以前 KeyError

In [16]: dfi.loc[:, 'C'] = dfi.loc[:, 'A']

In [17]: dfi
Out[17]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4

这就像是一个 append 手术开始了。

In [18]: dfi.loc[3] = 5

In [19]: dfi
Out[19]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

任意轴上的面板设置操作会将输入与面板对齐

In [20]: p = pd.Panel(np.arange(16).reshape(2, 4, 2),
   ....:              items=['Item1', 'Item2'],
   ....:              major_axis=pd.date_range('2001/1/12', periods=4),
   ....:              minor_axis=['A', 'B'], dtype='float64')
   ....:

In [21]: p
Out[21]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 2 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2001-01-12 00:00:00 to 2001-01-15 00:00:00
Minor_axis axis: A to B

In [22]: p.loc[:, :, 'C'] = pd.Series([30, 32], index=p.items)

In [23]: p
Out[23]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2001-01-12 00:00:00 to 2001-01-15 00:00:00
Minor_axis axis: A to C

In [24]: p.loc[:, :, 'C']
Out[24]:
            Item1  Item2
2001-01-12   30.0   32.0
2001-01-13   30.0   32.0
2001-01-14   30.0   32.0
2001-01-15   30.0   32.0

Float64Index API更改#

添加了新的索引类型， Float64Index 。这将在索引创建过程中传递浮点值时自动创建。这实现了纯基于标签的切片范例，从而使 [],ix,loc 标量索引和切片的工作原理完全相同。看见 the docs ， (GH263 )

构造默认情况下用于浮点类型值。

In [20]: index = pd.Index([1.5, 2, 3, 4.5, 5])

In [21]: index
Out[21]: Float64Index([1.5, 2.0, 3.0, 4.5, 5.0], dtype='float64')

In [22]: s = pd.Series(range(5), index=index)

In [23]: s
Out[23]: 
1.5    0
2.0    1
3.0    2
4.5    3
5.0    4
dtype: int64

标量选择 [],.ix,.loc 将始终基于标签。整数将匹配相等的浮点索引(例如 3 相当于 3.0 )

In [24]: s[3]
Out[24]: 2

In [25]: s.loc[3]
Out[25]: 2

唯一的位置索引是VIA iloc

In [26]: s.iloc[3]
Out[26]: 3

未找到的标量索引将引发 KeyError

切片始终位于索引值上，对于 [],ix,loc 并且总是定位于 iloc

In [27]: s[2:4]
Out[27]: 
2.0    1
3.0    2
dtype: int64

In [28]: s.loc[2:4]
Out[28]: 
2.0    1
3.0    2
dtype: int64

In [29]: s.iloc[2:4]
Out[29]: 
3.0    2
4.5    3
dtype: int64

在浮点型索引中，允许使用浮点型进行切片

In [30]: s[2.1:4.6]
Out[30]: 
3.0    2
4.5    3
dtype: int64

In [31]: s.loc[2.1:4.6]
Out[31]: 
3.0    2
4.5    3
dtype: int64

保留对其他索引类型的索引(以及位置回退 [],ix )，唯一例外的是对非 Float64Index 现在将引发一个 TypeError 。

In [1]: pd.Series(range(5))[3.5]
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)

In [1]: pd.Series(range(5))[3.5:4.5]
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)

在未来的版本中，不建议使用标量浮点索引器，但目前允许使用。

In [3]: pd.Series(range(5))[3.0]
Out[3]: 3

HDFStore API更改#

查询格式更改。现在支持更像字符串的查询格式。看见 the docs 。

In [32]: path = 'test.h5'

In [33]: dfq = pd.DataFrame(np.random.randn(10, 4),
   ....:                    columns=list('ABCD'),
   ....:                    index=pd.date_range('20130101', periods=10))
   ....: 

In [34]: dfq.to_hdf(path, 'dfq', format='table', data_columns=True)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [34], in <cell line: 1>()
----> 1 dfq.to_hdf(path, 'dfq', format='table', data_columns=True)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
   2651 from pandas.io import pytables
   2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
   2654 # "Union[DataFrame, Series]" [arg-type]
-> 2655 pytables.to_hdf(
   2656     path_or_buf,
   2657     key,
   2658     self,  # type: ignore[arg-type]
   2659     mode=mode,
   2660     complevel=complevel,
   2661     complib=complib,
   2662     append=append,
   2663     format=format,
   2664     index=index,
   2665     min_itemsize=min_itemsize,
   2666     nan_rep=nan_rep,
   2667     dropna=dropna,
   2668     data_columns=data_columns,
   2669     errors=errors,
   2670     encoding=encoding,
   2671 )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
    310 path_or_buf = stringify_path(path_or_buf)
    311 if isinstance(path_or_buf, str):
--> 312     with HDFStore(
    313         path_or_buf, mode=mode, complevel=complevel, complib=complib
    314     ) as store:
    315         f(store)
    316 else:

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    570 if "format" in kwargs:
    571     raise ValueError("format is not a defined argument for HDFStore")
--> 573 tables = import_optional_dependency("tables")
    575 if complib is not None and complib not in tables.filters.all_complibs:
    576     raise ValueError(
    577         f"complib only supports {tables.filters.all_complibs} compression."
    578     )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

使用布尔表达式，并进行内联函数求值。

In [35]: pd.read_hdf(path, 'dfq',
   ....:             where="index>Timestamp('20130104') & columns=['A', 'B']")
   ....: 
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [35], in <cell line: 1>()
----> 1 pd.read_hdf(path, 'dfq',
      2             where="index>Timestamp('20130104') & columns=['A', 'B']")

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:428, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs)
    425     exists = False
    427 if not exists:
--> 428     raise FileNotFoundError(f"File {path_or_buf} does not exist")
    430 store = HDFStore(path_or_buf, mode=mode, errors=errors, **kwargs)
    431 # can't auto open/close if we are using an iterator
    432 # so delegate to the iterator

FileNotFoundError: File test.h5 does not exist

使用内联列引用

In [36]: pd.read_hdf(path, 'dfq',
   ....:             where="A>0 or C>0")
   ....: 
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [36], in <cell line: 1>()
----> 1 pd.read_hdf(path, 'dfq',
      2             where="A>0 or C>0")

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:428, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs)
    425     exists = False
    427 if not exists:
--> 428     raise FileNotFoundError(f"File {path_or_buf} does not exist")
    430 store = HDFStore(path_or_buf, mode=mode, errors=errors, **kwargs)
    431 # can't auto open/close if we are using an iterator
    432 # so delegate to the iterator

FileNotFoundError: File test.h5 does not exist

这个 format 关键字现在取代了 table 关键字；允许值为 fixed(f) 或 table(t) 与之前<0.13.0相同的缺省值保持不变，例如 put 暗示 fixed 格式化和 append 暗示 table 格式化。通过设置，可以将此默认格式设置为选项 io.hdf.default_format 。

In [37]: path = 'test.h5'

In [38]: df = pd.DataFrame(np.random.randn(10, 2))

In [39]: df.to_hdf(path, 'df_table', format='table')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 df.to_hdf(path, 'df_table', format='table')

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
   2651 from pandas.io import pytables
   2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
   2654 # "Union[DataFrame, Series]" [arg-type]
-> 2655 pytables.to_hdf(
   2656     path_or_buf,
   2657     key,
   2658     self,  # type: ignore[arg-type]
   2659     mode=mode,
   2660     complevel=complevel,
   2661     complib=complib,
   2662     append=append,
   2663     format=format,
   2664     index=index,
   2665     min_itemsize=min_itemsize,
   2666     nan_rep=nan_rep,
   2667     dropna=dropna,
   2668     data_columns=data_columns,
   2669     errors=errors,
   2670     encoding=encoding,
   2671 )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
    310 path_or_buf = stringify_path(path_or_buf)
    311 if isinstance(path_or_buf, str):
--> 312     with HDFStore(
    313         path_or_buf, mode=mode, complevel=complevel, complib=complib
    314     ) as store:
    315         f(store)
    316 else:

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    570 if "format" in kwargs:
    571     raise ValueError("format is not a defined argument for HDFStore")
--> 573 tables = import_optional_dependency("tables")
    575 if complib is not None and complib not in tables.filters.all_complibs:
    576     raise ValueError(
    577         f"complib only supports {tables.filters.all_complibs} compression."
    578     )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

In [40]: df.to_hdf(path, 'df_table2', append=True)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [40], in <cell line: 1>()
----> 1 df.to_hdf(path, 'df_table2', append=True)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
   2651 from pandas.io import pytables
   2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
   2654 # "Union[DataFrame, Series]" [arg-type]
-> 2655 pytables.to_hdf(
   2656     path_or_buf,
   2657     key,
   2658     self,  # type: ignore[arg-type]
   2659     mode=mode,
   2660     complevel=complevel,
   2661     complib=complib,
   2662     append=append,
   2663     format=format,
   2664     index=index,
   2665     min_itemsize=min_itemsize,
   2666     nan_rep=nan_rep,
   2667     dropna=dropna,
   2668     data_columns=data_columns,
   2669     errors=errors,
   2670     encoding=encoding,
   2671 )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
    310 path_or_buf = stringify_path(path_or_buf)
    311 if isinstance(path_or_buf, str):
--> 312     with HDFStore(
    313         path_or_buf, mode=mode, complevel=complevel, complib=complib
    314     ) as store:
    315         f(store)
    316 else:

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    570 if "format" in kwargs:
    571     raise ValueError("format is not a defined argument for HDFStore")
--> 573 tables = import_optional_dependency("tables")
    575 if complib is not None and complib not in tables.filters.all_complibs:
    576     raise ValueError(
    577         f"complib only supports {tables.filters.all_complibs} compression."
    578     )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

In [41]: df.to_hdf(path, 'df_fixed')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 df.to_hdf(path, 'df_fixed')

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
   2651 from pandas.io import pytables
   2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
   2654 # "Union[DataFrame, Series]" [arg-type]
-> 2655 pytables.to_hdf(
   2656     path_or_buf,
   2657     key,
   2658     self,  # type: ignore[arg-type]
   2659     mode=mode,
   2660     complevel=complevel,
   2661     complib=complib,
   2662     append=append,
   2663     format=format,
   2664     index=index,
   2665     min_itemsize=min_itemsize,
   2666     nan_rep=nan_rep,
   2667     dropna=dropna,
   2668     data_columns=data_columns,
   2669     errors=errors,
   2670     encoding=encoding,
   2671 )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
    310 path_or_buf = stringify_path(path_or_buf)
    311 if isinstance(path_or_buf, str):
--> 312     with HDFStore(
    313         path_or_buf, mode=mode, complevel=complevel, complib=complib
    314     ) as store:
    315         f(store)
    316 else:

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    570 if "format" in kwargs:
    571     raise ValueError("format is not a defined argument for HDFStore")
--> 573 tables = import_optional_dependency("tables")
    575 if complib is not None and complib not in tables.filters.all_complibs:
    576     raise ValueError(
    577         f"complib only supports {tables.filters.all_complibs} compression."
    578     )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

In [42]: with pd.HDFStore(path) as store:
   ....:     print(store)
   ....: 
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [42], in <cell line: 1>()
----> 1 with pd.HDFStore(path) as store:
      2     print(store)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    570 if "format" in kwargs:
    571     raise ValueError("format is not a defined argument for HDFStore")
--> 573 tables = import_optional_dependency("tables")
    575 if complib is not None and complib not in tables.filters.all_complibs:
    576     raise ValueError(
    577         f"complib only supports {tables.filters.all_complibs} compression."
    578     )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

显著提高了表写入性能
处理通过的 Series 以表格形式显示 (GH4330 )
现在可以序列化 timedelta64[ns] 表格中的数据类型 (GH3577 )，请参阅 the docs 。
添加了一个 is_open 属性以指示基础文件句柄是否为_OPEN；已关闭的存储区现在将在查看存储区时报告“已关闭”(而不是引发错误) (GH4409 )

结束了一场 HDFStore 现在将关闭 HDFStore 但仅当引用计数时才关闭实际文件(按 PyTables )W.r.t.所有打开的控制柄都为0。实际上，您有一个本地实例 HDFStore 由变量引用。一旦关闭，它将报告已关闭。其他引用(对同一文件)将继续运行，直到它们本身被关闭。对已关闭的文件执行操作将引发 ClosedFileError

In [43]: path = 'test.h5'

In [44]: df = pd.DataFrame(np.random.randn(10, 2))

In [45]: store1 = pd.HDFStore(path)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [45], in <cell line: 1>()
----> 1 store1 = pd.HDFStore(path)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    570 if "format" in kwargs:
    571     raise ValueError("format is not a defined argument for HDFStore")
--> 573 tables = import_optional_dependency("tables")
    575 if complib is not None and complib not in tables.filters.all_complibs:
    576     raise ValueError(
    577         f"complib only supports {tables.filters.all_complibs} compression."
    578     )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

In [46]: store2 = pd.HDFStore(path)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [46], in <cell line: 1>()
----> 1 store2 = pd.HDFStore(path)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
    570 if "format" in kwargs:
    571     raise ValueError("format is not a defined argument for HDFStore")
--> 573 tables = import_optional_dependency("tables")
    575 if complib is not None and complib not in tables.filters.all_complibs:
    576     raise ValueError(
    577         f"complib only supports {tables.filters.all_complibs} compression."
    578     )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

In [47]: store1.append('df', df)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [47], in <cell line: 1>()
----> 1 store1.append('df', df)

NameError: name 'store1' is not defined

In [48]: store2.append('df2', df)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [48], in <cell line: 1>()
----> 1 store2.append('df2', df)

NameError: name 'store2' is not defined

In [49]: store1
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [49], in <cell line: 1>()
----> 1 store1

NameError: name 'store1' is not defined

In [50]: store2
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [50], in <cell line: 1>()
----> 1 store2

NameError: name 'store2' is not defined

In [51]: store1.close()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [51], in <cell line: 1>()
----> 1 store1.close()

NameError: name 'store1' is not defined

In [52]: store2
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [52], in <cell line: 1>()
----> 1 store2

NameError: name 'store2' is not defined

In [53]: store2.close()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [53], in <cell line: 1>()
----> 1 store2.close()

NameError: name 'store2' is not defined

In [54]: store2
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [54], in <cell line: 1>()
----> 1 store2

NameError: name 'store2' is not defined

删除了 _quiet 属性，替换为 DuplicateWarning 如果从表中检索重复行 (GH4367 )
删除了 warn 论据来自 open 。相反，一个 PossibleDataLossError 如果您尝试使用 mode='w' 使用打开的文件句柄 (GH4367 )
允许将传递的位置数组或掩码作为 where 条件 (GH4467 )。看见 the docs 举个例子。
add the keyword dropna=True to append to change whether ALL nan rows are not written to the store (default is True, ALL nan rows are NOT written), also settable via the option io.hdf.dropna_table (GH4625)
传递存储创建参数；可用于支持内存中存储

DataFrame恢复更改#

的HTML和纯文本表示形式 DataFrame 现在，在表超过特定大小时显示截断的表视图，而不是切换到Short INFO视图 (GH4886 ， GH5550 )。随着小DataFrame变得更大，这使得表示更一致。

要获取信息视图，请调用 DataFrame.info() 。如果您更喜欢将信息视图作为大型DataFrame的REPR，则可以通过运行 set_option('display.large_repr', 'info') 。

增强#

df.to_clipboard() 学到了一个新的 excel 关键字，用于将DF数据直接粘贴到EXCEL中(默认情况下启用)。 (GH5070 )。
read_html now raises a URLError instead of catching and raising a ValueError (GH4303, GH4305)
Added a test for read_clipboard() and to_clipboard() (GH4282)
剪贴板功能现在可以与PySide一起使用 (GH4282 )
当打印参数包含重叠的颜色和样式参数时，添加了信息量更大的错误消息 (GH4402 )
to_dict 现在需要 records 作为一个可能的外向型。返回列键控词典的数组。 (GH4936 )

NaN handing in get_dummies (GH4446) with dummy_na

# previously, nan was erroneously counted as 2 here
# now it is not counted at all
In [55]: pd.get_dummies([1, 2, np.nan])
Out[55]: 
   1.0  2.0
0    1    0
1    0    1
2    0    0

# unless requested
In [56]: pd.get_dummies([1, 2, np.nan], dummy_na=True)
Out[56]: 
   1.0  2.0  NaN
0    1    0    0
1    0    1    0
2    0    0    1

timedelta64[ns] 运营部。看见 the docs 。

警告

这些操作中的大多数需要 numpy >= 1.7

使用新的顶层 to_timedelta ，您可以从标准时间增量格式(由生成)转换标量或数组 to_csv )转换为时间增量类型 (np.timedelta64 在……里面 nanoseconds )。

In [57]: pd.to_timedelta('1 days 06:05:01.00003')
Out[57]: Timedelta('1 days 06:05:01.000030')

In [58]: pd.to_timedelta('15.5us')
Out[58]: Timedelta('0 days 00:00:00.000015500')

In [59]: pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
Out[59]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None)

In [60]: pd.to_timedelta(np.arange(5), unit='s')
Out[60]: 
TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
                '0 days 00:00:03', '0 days 00:00:04'],
               dtype='timedelta64[ns]', freq=None)

In [61]: pd.to_timedelta(np.arange(5), unit='d')
Out[61]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

一系列数据类型 timedelta64[ns] 现在可以被另一个除以 timedelta64[ns] 对象，或进行astype以生成 float64 Dtype系列。这是一种频率转换。看见 the docs 为了医生。

In [62]: import datetime

In [63]: td = pd.Series(pd.date_range('20130101', periods=4)) - pd.Series(
   ....:     pd.date_range('20121201', periods=4))
   ....: 

In [64]: td[2] += np.timedelta64(datetime.timedelta(minutes=5, seconds=3))

In [65]: td[3] = np.nan

In [66]: td
Out[66]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3                NaT
dtype: timedelta64[ns]

# to days
In [67]: td / np.timedelta64(1, 'D')
Out[67]: 
0    31.000000
1    31.000000
2    31.003507
3          NaN
dtype: float64

In [68]: td.astype('timedelta64[D]')
Out[68]: 
0    31.0
1    31.0
2    31.0
3     NaN
dtype: float64

# to seconds
In [69]: td / np.timedelta64(1, 's')
Out[69]: 
0    2678400.0
1    2678400.0
2    2678703.0
3          NaN
dtype: float64

In [70]: td.astype('timedelta64[s]')
Out[70]: 
0    2678400.0
1    2678400.0
2    2678703.0
3          NaN
dtype: float64

除以或乘以 timedelta64[ns] 按整数级数或整数级数

In [71]: td * -1
Out[71]: 
0   -31 days +00:00:00
1   -31 days +00:00:00
2   -32 days +23:54:57
3                  NaT
dtype: timedelta64[ns]

In [72]: td * pd.Series([1, 2, 3, 4])
Out[72]: 
0   31 days 00:00:00
1   62 days 00:00:00
2   93 days 00:15:09
3                NaT
dtype: timedelta64[ns]

绝对的 DateOffset 对象可以等同于 timedeltas

In [73]: from pandas import offsets

In [74]: td + offsets.Minute(5) + offsets.Milli(5)
Out[74]: 
0   31 days 00:05:00.005000
1   31 days 00:05:00.005000
2   31 days 00:10:03.005000
3                       NaT
dtype: timedelta64[ns]

现在，时间增量支持Fillna

In [75]: td.fillna(pd.Timedelta(0))
Out[75]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3    0 days 00:00:00
dtype: timedelta64[ns]

In [76]: td.fillna(datetime.timedelta(days=1, seconds=5))
Out[76]: 
0   31 days 00:00:00
1   31 days 00:00:00
2   31 days 00:05:03
3    1 days 00:00:05
dtype: timedelta64[ns]

您可以对时间增量执行数值归约操作。

In [77]: td.mean()
Out[77]: Timedelta('31 days 00:01:41')

In [78]: td.quantile(.1)
Out[78]: Timedelta('31 days 00:00:00')

plot(kind='kde') 现在接受可选参数 bw_method 和 ind 分别传递给scipy.stats.gauskde()(对于scipy>=0.11.0)以设置带宽，并传递给gkde.valuate()以指定评估带宽的指数。请参见Scipy Docs。 (GH4298 )
DataFrame构造函数现在接受数字掩码记录数组 (GH3478 )

一种新的矢量化弦方法 extract 更方便地返回正则表达式匹配。

In [79]: pd.Series(['a1', 'b2', 'c3']).str.extract('[ab](\\d)')
Out[79]: 
     0
0    1
1    2
2  NaN

不匹配的元素返回 NaN 。提取包含多个组的正则表达式将返回每个组包含一列的DataFrame。

In [80]: pd.Series(['a1', 'b2', 'c3']).str.extract('([ab])(\\d)')
Out[80]: 
     0    1
0    a    1
1    b    2
2  NaN  NaN

不匹配的元素返回一行 NaN 。因此，一系列杂乱无章的字符串可以 已转换 转化为经过清理或更有用的字符串的类似索引的Series或DataFrame，而不需要 get() 要访问元组或 re.match 对象。

命名组，如

In [81]: pd.Series(['a1', 'b2', 'c3']).str.extract(
   ....:     '(?P<letter>[ab])(?P<digit>\\d)')
   ....: 
Out[81]: 
  letter digit
0      a     1
1      b     2
2    NaN   NaN

也可以使用可选的组。

In [82]: pd.Series(['a1', 'b2', '3']).str.extract(
   ....:      '(?P<letter>[ab])?(?P<digit>\\d)')
   ....: 
Out[82]: 
  letter digit
0      a     1
1      b     2
2    NaN     3

read_stata 现在接受Stata 13格式 (GH4291 )
read_fwf 现在，如果数据使用提供给函数的分隔符正确分隔并正确对齐列，则会从文件的前100行推断列规范 (GH4488 )。

支持纳秒时间作为偏移量

警告

这些操作需要 numpy >= 1.7

对秒及以下范围内的周期转换进行了修改，并将其扩展到纳秒。现在可以使用纳秒范围内的周期。

In [83]: pd.date_range('2013-01-01', periods=5, freq='5N')
Out[83]: 
DatetimeIndex([          '2013-01-01 00:00:00',
               '2013-01-01 00:00:00.000000005',
               '2013-01-01 00:00:00.000000010',
               '2013-01-01 00:00:00.000000015',
               '2013-01-01 00:00:00.000000020'],
              dtype='datetime64[ns]', freq='5N')

或以频率作为偏移量

In [84]: pd.date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5))
Out[84]: 
DatetimeIndex([          '2013-01-01 00:00:00',
               '2013-01-01 00:00:00.000000005',
               '2013-01-01 00:00:00.000000010',
               '2013-01-01 00:00:00.000000015',
               '2013-01-01 00:00:00.000000020'],
              dtype='datetime64[ns]', freq='5N')

时间戳可以在纳秒范围内修改

In [85]: t = pd.Timestamp('20130101 09:01:02')

In [86]: t + pd.tseries.offsets.Nano(123)
Out[86]: Timestamp('2013-01-01 09:01:02.000000123')

一种新的方法， isin 对于DataFrames，它与布尔索引配合得很好。的论据 isin ，我们要将DataFrame与之进行比较，可以是DataFrame、Series、Dict或值数组。看见 the docs 想要更多。

要获取满足任何条件的行，请执行以下操作：

In [87]: dfi = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']})

In [88]: dfi
Out[88]: 
   A  B
0  1  a
1  2  b
2  3  f
3  4  n

In [89]: other = pd.DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']})

In [90]: mask = dfi.isin(other)

In [91]: mask
Out[91]: 
       A      B
0   True  False
1  False  False
2   True   True
3  False  False

In [92]: dfi[mask.any(axis=1)]
Out[92]: 
   A  B
0  1  a
2  3  f

Series 现在支持 to_frame 方法将其转换为单列DataFrame (GH5164 )
现在可以将此处列出的所有R数据集加载到http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html对象中
```
# note that pandas.rpy was deprecated in v0.16.0
import pandas.rpy.common as com
com.load_data('Titanic')
```
tz_localize can infer a fall daylight savings transition based on the structure of the unlocalized data (GH4230), see the docs
DatetimeIndex is now in the API documentation, see the docs
json_normalize() is a new method to allow you to create a flat table from semi-structured JSON data. See the docs (GH1067)
添加了对qtanda as DataFrameModel和DataFrameWidget的PySide支持。
PythonCSV解析器现在支持usecol (GH4335 )
频率增加了几个新的偏移量：
- LastWeekOfMonth (GH4637)
- FY5253, and FY5253Quarter (GH4511)

DataFrame有了一个新的 interpolate 方法，类似于系列 (GH4434 ， GH1892 )

In [93]: df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
   ....:                   'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
   ....: 

In [94]: df.interpolate()
Out[94]: 
     A      B
0  1.0   0.25
1  2.1   1.50
2  3.4   2.75
3  4.7   4.00
4  5.6  12.20
5  6.8  14.40

此外， method 参数为 interpolate 已扩展到包括 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', 'piecewise_polynomial', 'pchip', 'polynomial', 'spline' 新的方法需要 scipy. 请参考Scipy参考资料 guide 和 documentation 有关各种方法何时适用的详细信息，请参阅。看见 the docs 。

Interpolate Now还接受 limit 关键字参数。其工作原理类似于 fillna 的限制：

In [95]: ser = pd.Series([1, 3, np.nan, np.nan, np.nan, 11])

In [96]: ser.interpolate(limit=2)
Out[96]: 
0     1.0
1     3.0
2     5.0
3     7.0
4     NaN
5    11.0
dtype: float64

已添加 wide_to_long 面板数据便捷功能。看见 the docs 。

In [97]: np.random.seed(123)

In [98]: df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
   ....:                    "A1980" : {0 : "d", 1 : "e", 2 : "f"},
   ....:                    "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
   ....:                    "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
   ....:                    "X"     : dict(zip(range(3), np.random.randn(3)))
   ....:                   })
   ....: 

In [99]: df["id"] = df.index

In [100]: df
Out[100]: 
  A1970 A1980  B1970  B1980         X  id
0     a     d    2.5    3.2 -1.085631   0
1     b     e    1.2    1.3  0.997345   1
2     c     f    0.7    0.1  0.282978   2

In [101]: pd.wide_to_long(df, ["A", "B"], i="id", j="year")
Out[101]: 
                X  A    B
id year                  
0  1970 -1.085631  a  2.5
1  1970  0.997345  b  1.2
2  1970  0.282978  c  0.7
0  1980 -1.085631  d  3.2
1  1980  0.997345  e  1.3
2  1980  0.282978  f  0.1

to_csv 现在需要一个 date_format 关键字参数，指定应如何设置输出DateTime对象的格式。索引、列和值中遇到的日期时间都将应用此格式。 (GH4313 )
DataFrame.plot will scatter plot x versus y by passing kind='scatter' (GH2215)
添加了对Google Analytics v3 API段ID的支持，该ID也支持v2 ID。 (GH5271 )

实验性的#

新的 eval() 函数使用以下命令实现表达式求值 numexpr 在幕后。这为涉及大型DataFrame/Series的复杂表达式带来了很大的加速比。例如,

In [102]: nrows, ncols = 20000, 100

In [103]: df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols))
   .....:                       for _ in range(4)]
   .....: 

# eval with NumExpr backend
In [104]: %timeit pd.eval('df1 + df2 + df3 + df4')
7.63 ms +- 41.2 us per loop (mean +- std. dev. of 7 runs, 100 loops each)

# pure Python evaluation
In [105]: %timeit df1 + df2 + df3 + df4
6.99 ms +- 49.7 us per loop (mean +- std. dev. of 7 runs, 100 loops each)

有关更多详细信息，请参阅 the docs

类似于 pandas.eval ， DataFrame 有一个新的 DataFrame.eval 方法来计算 DataFrame 。例如,

In [106]: df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b'])

In [107]: df.eval('a + b')
Out[107]: 
 -0.685204
  1.589745
  0.325441
 -1.784153
 -0.432893
  0.171850
  1.895919
  3.065587
 -0.092759
  1.391365
dtype: float64

query() 方法，该方法允许您选择 DataFrame 使用与Python语法几乎相同的自然查询语法。例如,

In [108]: n = 20

In [109]: df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c'])

In [110]: df.query('a < b < c')
Out[110]: 
    a   b   c
11  1   5   8
15  8  16  19

选择的所有行 df 哪里 a < b < c 计算结果为 True 。有关更多详细信息，请参阅 the docs 。

pd.read_msgpack() and pd.to_msgpack() are now a supported method of serialization of arbitrary pandas (and python objects) in a lightweight portable binary format. See the docs

警告

由于这是一个试验库，因此存储格式可能要到将来的版本才能稳定。

df = pd.DataFrame(np.random.rand(5, 2), columns=list('AB'))
df.to_msgpack('foo.msg')
pd.read_msgpack('foo.msg')

s = pd.Series(np.random.rand(5), index=pd.date_range('20130101', periods=5))
pd.to_msgpack('foo.msg', df, s)
pd.read_msgpack('foo.msg')

你可以过去了 iterator=True 对解压缩的结果进行迭代

for o in pd.read_msgpack('foo.msg', iterator=True):
    print(o)

pandas.io.gbq provides a simple way to extract from, and load data into, Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high performance SQL-like database service, useful for performing ad-hoc queries against extremely large datasets. See the docs

from pandas.io import gbq

# A query to select the average monthly temperatures in the
# in the year 2000 across the USA. The dataset,
# publicata:samples.gsod, is available on all BigQuery accounts,
# and is based on NOAA gsod data.

query = """SELECT station_number as STATION,
month as MONTH, AVG(mean_temp) as MEAN_TEMP
FROM publicdata:samples.gsod
WHERE YEAR = 2000
GROUP BY STATION, MONTH
ORDER BY STATION, MONTH ASC"""

# Fetch the result set for this query

# Your Google BigQuery Project ID
# To find this, see your dashboard:
# https://console.developers.google.com/iam-admin/projects?authuser=0
projectid = 'xxxxxxxxx'
df = gbq.read_gbq(query, project_id=projectid)

# Use pandas to process and reshape the dataset

df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP')
df3 = pd.concat([df2.min(), df2.mean(), df2.max()],
                axis=1, keys=["Min Tem", "Mean Temp", "Max Temp"])

生成的DataFrame为：：

> df3
            Min Tem  Mean Temp    Max Temp
 MONTH
   -53.336667  39.827892   89.770968
   -49.837500  43.685219   93.437932
   -77.926087  48.708355   96.099998
   -82.892858  55.070087   97.317240
   -92.378261  61.428117  102.042856
   -77.703334  65.858888  102.900000
   -87.821428  68.169663  106.510714
   -89.431999  68.614215  105.500000
   -86.611112  63.436935  107.142856
  -78.209677  56.880838   92.103333
  -50.125000  48.861228   94.996428
  -50.332258  42.286879   94.396774

警告

要使用此模块，您需要一个BigQuery帐户。详情请参见<https://cloud.google.com/products/big-query>。

截至2013年10月10日，Google的API中存在一个错误，阻止结果集大于10万行。补丁计划在10/14/13这一周进行。

内部重构#

在0.13.0中，主要对子类进行了重大重构 Series 从… NDFrame ，它是当前 DataFrame 和 Panel ，统一方法和行为。以前直接从 ndarray 。 (GH4080 ， GH3862 ， GH816 )

警告

从<0.13.0开始有两个潜在的不兼容

使用某些NumPy函数以前会返回一个 Series 如果传递的是 Series 作为一种争论。这似乎只会影响 np.ones_like ， np.empty_like ， np.diff 和 np.where 。这些现在还回来了 ndarrays 。

In [111]: s = pd.Series([1, 2, 3, 4])

块状的用法

In [112]: np.ones_like(s)
Out[112]: array([1, 1, 1, 1])

In [113]: np.diff(s)
Out[113]: array([1, 1, 1])

In [114]: np.where(s > 1, s, np.nan)
Out[114]: array([nan,  2.,  3.,  4.])

Pandonic用法

In [115]: pd.Series(1, index=s.index)
Out[115]: 
0    1
1    1
2    1
3    1
dtype: int64

In [116]: s.diff()
Out[116]: 
0    NaN
1    1.0
2    1.0
3    1.0
dtype: float64

In [117]: s.where(s > 1)
Out[117]: 
0    NaN
1    2.0
2    3.0
3    4.0
dtype: float64

传递一个 Series 直接连接到一个Cython函数，该函数需要 ndarray 类型将不再直接工作，您必须传递 Series.values ，请参阅 Enhancing Performance
Series(0.5) would previously return the scalar 0.5, instead this will return a 1-element Series
这一变化打破了 rpy2<=2.3.8 。已针对rpy2打开一个问题，并在中详细介绍了解决方法 GH5698 。感谢@JanSchulz。

对于0.13之前创建的泡菜，将保留泡菜兼容性。这些必须用不腌制的 pd.read_pickle ，请参见 Pickling 。
重构Series.py/Frame.py/panel.py以将常见代码移动到Generic.py
- 已添加 _setup_axes 创建常规NDFrame结构
- 已移动的方法
  - from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop
  - __iter__,keys,__contains__,__len__,__neg__,__invert__
  - convert_objects,as_blocks,as_matrix,values
  - __getstate__,__setstate__ (Compat保留在框架/面板中)
  - __getattr__,__setattr__
  - _indexed_same,reindex_like,align,where,mask
  - fillna,replace (Series replace is now consistent with DataFrame)
  - filter (还添加了轴参数，以选择性地在不同的轴上过滤)
  - reindex,reindex_axis,take
  - truncate (已移动以成为 NDFrame )
这些是API更改，它们使 Panel 更符合 DataFrame
- swapaxes 在一个 Panel 使用指定的相同轴，现在返回一个副本
- 支持设置属性访问
- Filter支持与原始版本相同的接口 DataFrame 过滤
在没有参数的情况下调用reindex现在将返回输入对象的副本
TimeSeries 现在是的别名 Series 。该物业 is_time_series 可用于区分(如果需要)
重构稀疏对象以使用BlockManager
- 在内饰中创建了新的积木类型， SparseBlock ，它可以容纳多个数据类型，并且是不可合并的。 SparseSeries 和 SparseDataFrame 现在从这些层次结构(Series/DataFrame)继承更多方法，而不再从 SparseArray (相反，它是 SparseBlock )
- 稀疏套件现在支持与非稀疏数据集成。支持非浮点型稀疏数据(部分实现)
- 在DataFrames中对稀疏结构的操作应该保持稀疏性，合并类型的操作将转换为密集(然后再转换为稀疏)，因此效率可能会有点低
- 启用设置 SparseSeries 用于布尔/整型/切片
- SparsePanels 实施保持不变(例如，不使用BlockManager，需要工作)
已添加 ftypes 方法设置为Series/DataFrame，类似于 dtypes ，但指示基础是否为稀疏/密集(以及数据类型)
全 NDFrame 对象现在可以使用 __finalize__() 要指定要从现有对象传播到新对象的各种值(例如 name 在……里面 Series 现在将更自动地跟踪)
内部类型检查现在通过一套生成的类来完成，从而允许 isinstance(value, klass) 无需直接导入Klass，由@jtratner提供
系列更新中的错误，其中父框架没有根据更改更新其缓存 (GH4080 )或类型 (GH3217 )、Fill na (GH3386 )
已修复使用数据类型转换编制索引的问题 (GH4463 ， GH4204 )
重构 Series.reindex 至core/Generic.py (GH4604 ， GH4618 )，允许 method= 在对系列进行重新索引以进行工作时
Series.copy 不再接受 order 参数，并且现在与 NDFrame 复制
重构 rename 方法到core/Generic.py；修复 Series.rename 为 (GH4605 )，并添加了 rename 使用相同的签名 Panel
重构 clip 到core/Generic.py的方法 (GH4798 )
重构 _get_numeric_data/_get_bool_data 到core/Generic.py，允许系列/面板功能

Series (用于索引)/ Panel (对于项)现在允许对其元素进行属性访问 (GH1903 )

In [118]: s = pd.Series([1, 2, 3], index=list('abc'))

In [119]: s.b
Out[119]: 2

In [120]: s.a = 5

In [121]: s
Out[121]: 
a    5
b    2
c    3
dtype: int64

错误修复#

HDFStore
- 养育一名伤残人士 TypeError 而不是 ValueError 当追加不同的块顺序时 (GH4096 )
- read_hdf was not respecting as passed mode (GH4504)
- 追加0-len表将正常工作 (GH4273 )
- to_hdf was raising when passing both arguments append and table (GH4584)
- 从具有跨dtype的重复列的存储区中读取将引发 (GH4767 )
- 修复了以下位置的错误 ValueError 当列名不是字符串时，未正确引发 (GH4956 )
- 以固定格式编写的零长度序列未正确反序列化。 (GH4708 )
- 修复了pyt3上的解码性能问题 (GH5441 )
- 在存储之前验证多索引中的级别 (GH5527 )
- 正确处理 data_columns 带面板 (GH5717 )
Fixed bug in tslib.tz_convert(vals, tz1, tz2): it could raise IndexError exception while trying to access trans[pos + 1] (GH4496)
这个 by 参数现在可以正确使用 layout 论据 (GH4102 ， GH4014 ) *.hist 标绘方法
修复了中的错误 PeriodIndex.map 在哪里使用 str 将返回索引的字符串表示形式 (GH4136 )
修复了测试失败问题 test_time_series_plot_color_with_empty_kwargs 使用自定义matplotlib默认颜色时 (GH4345 )
修复Stata IO测试的运行。现在使用临时文件写入 (GH4353 )
修复了以下问题 DataFrame.sum 比这更慢 DataFrame.mean 对于整数值帧 (GH4365 )
read_html 测试现在可与Python2.6配合使用 (GH4351 )
修复了以下错误 network 测试是在抛出 NameError 因为未定义局部变量 (GH4381 )
在……里面 to_json ，如果通过，则举起 orient 会因为重复的索引而导致数据丢失 (GH4359 )
在……里面 to_json ，修复日期处理，因此毫秒是文档字符串所说的默认时间戳 (GH4362 )。
as_index 在执行分组应用时不再被忽略 (GH4648 ， GH3417 )
JSON NaT handling fixed, NaTs are now serialized to null (GH4498)
修复了JSON对象键中可转义字符的JSON处理 (GH4593 )
Fixed passing keep_default_na=False when na_values=None (GH4318)
修复了的错误 values 在具有重复列和混合数据类型的DataFrame上引发错误，出现在 (GH4377 )
Fixed bug with duplicate columns and type conversion in read_json when orient='split' (GH4377)
修复了区域设置中使用小数分隔符而不是‘.’的JSON错误在编码/解码某些值时引发异常。 (GH4918 )
Fix .iat indexing with a PeriodIndex (GH4390)
修复了以下问题 PeriodIndex 与SELF连接将返回一个新实例，而不是相同的实例 (GH4379 )；还为其他索引类型添加了此测试
修复了在使用带有usecols参数的CSV cparser时所有数据类型都被转换为对象的错误 (GH3192 )
修复合并块时出现的问题，其中生成的DataFrame具有部分set_ref_Locs (GH4403 )
修复了使用顶级matplotlib API调用hist子图时会覆盖它们的问题 (GH4408 )
修复了调用 Series.astype(str) 将截断该字符串 (GH4405 ， GH4437 )
修复了字节作为元组重新生成时的py3兼容问题 (GH4455 )
修正了项目命名为‘a’时的面板属性命名冲突 (GH3440 )
修复了打印时引发重复索引的问题 (GH4486 )
修正了Cumsum和Cumprod不适用于bool数据类型的问题 (GH4170 ， GH4440 )
固定面板切片发布于 xs 返回了一个不正确的暗显对象 (GH4016 )
修复了只有一个组时不使用自定义约简功能的重采样错误 (GH3849 ， GH4494 )
带转置框架的固定面板分配 (GH3830 )
在使用Panel和Panel作为需要对齐的值的集合索引时引发 (GH3777 )
冻结集对象现在在 Series 构造函数 (GH4482 ， GH4480 )
修复了对具有多个数据类型的重复多索引进行排序时出现的问题 (GH4516 )
修复了中的错误 DataFrame.set_values 这会导致在扩展索引时丢失名称属性。 (GH3742 ， GH4039 )
修复了以下问题： names ， levels 和 labels 可以被设置为 MultiIndex 未经验证 (GH3714 ， GH4039 )
固定的 (GH3334 )。如果值是索引，则不计算边距。
修复了RHS为 np.timedelta64 或 np.offsets.DateOffset 使用DateTime操作时 (GH4532 )
使用序列/日期时间索引和 np.timedelta64 工作方式不同 (GH4134 )和NumPy 1.6中的错误时间增量 (GH4135 )
修复错误 pd.read_clipboard 在带有PY3的Windows上 (GH4561 )；没有正确解码
tslib.get_period_field() 和 tslib.get_period_field_arr() 现在，将If代码参数提高到超出范围 (GH4519 ， GH4520 )
修复空序列上的布尔索引丢失索引名 (GH4235 )，INFER_dtype适用于空数组。
修复了多个轴的重建索引；如果轴匹配没有替换当前轴，则可能会导致延迟频率推断问题 (GH3317 )
修复了以下情况下的问题 DataFrame.apply 错误地重新定位异常(导致原始堆栈跟踪被截断)。
修复所选内容 ix/loc 和非唯一选择符(_U) (GH4619 )
修复涉及现有列中数据类型更改的iloc/loc赋值 (GH4312 ， GH5702 )在核心/索引中有内部setitem_with_indexer以使用Block.setitem
修复了CSV_IMPORT中未正确处理浮点数的千位运算符的错误 (GH4322 )
修复了许多DateOffset未正确使用CacheableOffset的问题；这阻止了DateOffset被缓存 (GH4609 )
修复与LHS上的DataFrame和RHS上的列表/元组的布尔比较 (GH4576 )
Fix error/dtype conversion with setitem of None on Series/DataFrame (GH4667)
Fix decoding based on a passed in non-default encoding in pd.read_stata (GH4626)
修复 DataFrame.from_records 加一种普通的香草 ndarray 。 (GH4727 )
修复某些不一致之处 Index.rename 和 MultiIndex.rename 等。 (GH4718 ， GH4628 )
在使用中出现错误 iloc/loc 具有横截面和重复索引 (GH4726 )
使用时出现错误 QUOTE_NONE 使用 to_csv 引起 Exception 。 (GH4328 )
当右侧的长度不正确时，序列索引不会引发错误的错误 (GH2702 )
将部分字符串选择作为多索引的一部分进行多索引时出现错误 (GH4758 )
Bug with reindexing on the index with a non-unique index will now raise ValueError (GH4746)
设置时出现错误 loc/ix 具有多索引轴和NumPy数组的单个索引器，与 (GH3777 )
在未与轴=0合并的数据类型中存在重复列的串联错误 (GH4771 ， GH4975 )
窃听 iloc 当切片索引失败时 (GH4771 )
不正确的错误消息，没有柱面或宽度 read_fwf 。 (GH4774 )
修复具有重复索引的系列中的索引错误 (GH4548 ， GH4550 )
修复了使用以下命令读取压缩文件的错误 read_fwf 在Python3中。 (GH3963 )
通过更改数据类型修复了重复索引和赋值的问题 (GH4686 )
修复了在AS中读取压缩文件的错误 bytes 而不是 str 在Python3中。简化了在Python3中产生字节的文件处理 (GH3963 ， GH4785 )。
修复了在不同版本的matplotlib中使用对数比例条形图修复了与ticklocs/tickLabels相关的问题 (GH4789 )
与REPR()发出的内部调用相关联的已取消警告 (GH4391 )
Fixed an issue with a duplicate index and duplicate selector with .loc (GH4825)
Fixed an issue with DataFrame.sort_index where, when sorting by a single column and passing a list for ascending, the argument for ascending was being interpreted as True (GH4839, GH4846)
Fixed Panel.tshift not working. Added freq support to Panel.shift (GH4853)
修复带有数千个！=“，”的TextFileReader中的一个问题。 (GH4596 )
使用WHERE时具有重复索引的getitem中出现错误 (GH4879 )
FIX类型推断代码将浮点列强制为DATETIME (GH4601 )
固定的 _ensure_numeric 不检查复数 (GH4902 )
修复了中的错误 Series.hist 其中两个人物正在创建时， by 参数已传递 (GH4112 ， GH4113 )。
修复了中的错误 convert_objects 对于>2 ndims (GH4937 )
修复了DataFrame/Panel缓存插入和后续索引中的错误 (GH4939 ， GH5424 )
Fixed string methods for FrozenNDArray and FrozenList (GH4929)
修复了在索引放大方案中设置无效或超出范围的值的错误 (GH4940 )
空系列上的FILLNA测试 (GH4346 )，谢谢@imerrr
固定的 copy() 也可以浅复制轴/索引，从而保持单独的元数据。 (GH4202 ， GH4830 )
修复了Read_CSV的Python解析器中的skipprows选项 (GH4382 )
修复了错误预防 cut 从使用 np.inf 不显式传递标签的级别 (GH3415 )
Fixed wrong check for overlapping in DatetimeIndex.union (GH4564)
修复了csv_parser中千位分隔符和日期解析器之间的冲突 (GH4678 )
修复数据类型不同时的追加(显示混合浮点数/np.DateTime64时出错) (GH4993 )
修复DateOffset的REPR。不再以kwd显示重复条目。删除了未使用的偏移量字段。 (GH4638 )
修复了使用usecol时READ_CSV期间的错误索引名称。仅适用于c语言解析器。 (GH4201 )
Timestamp 对象现在可以出现在比较操作的左侧， Series 或 DataFrame 对象 (GH4982 )。
Fix a bug when indexing with np.nan via iloc/loc (GH5016)
修复了内存不足的c解析器可能在同一文件的不同块中创建不同类型的错误。现在强制为数字类型或发出警告。 (GH3866 )
Fix a bug where reshaping a Series to its own shape raised TypeError (GH4554) and other reshaping issues.
设置时出现错误 ix/loc 和混合整型/字符串索引 (GH4544 )
确保系列-系列布尔比较是基于标签的 (GH4947 )
使用时间戳部分索引器进行多级索引时出现错误 (GH4294 )
测试/修复全NAN框架的多索引构造 (GH4078 )
修复了以下位置的错误 read_html() 未正确推断带有逗号的表的值 (GH5029 )
修复了以下位置的错误 read_html() 没有提供返回表的稳定排序 (GH4770 ， GH5029 )。
Fixed a bug where read_html() was incorrectly parsing when passed index_col=0 (GH5066).
修复了以下位置的错误 read_html() 错误地推断标头的类型 (GH5048 )。
修复了以下位置的错误 DatetimeIndex 与 PeriodIndex 导致堆栈溢出 (GH3899 )。
修复了以下位置的错误 groupby 对象不允许打印 (GH5102 )。
修复了以下位置的错误 groupby 对象不是制表符完成的列名 (GH5102 )。
修复了以下位置的错误 groupby.plot() 朋友们多次重复数字 (GH5102 )。
提供自动转换 object 填充NA上的数据类型，相关 (GH5103 )
修复了在选项解析器清理过程中覆盖默认选项的错误 (GH5121 )。
对某一名单[n]一视同仁 iloc 使用类似列表的索引 (GH5006 )
修复 MultiIndex.get_level_values() 带有缺失的值 (GH5074 )
修复了对具有DateTime64输入的Timestamp()的边界检查 (GH4065 )
在以下位置修复错误 TestReadHtml 没有拨打正确的 read_html() 功能 (GH5150 )。
用来修复错误 NDFrame.replace() 这使得替换看起来好像(错误地)使用了正则表达式 (GH5143 )。
修复了TO_DATETIME更好的错误消息 (GH4928 )
确保在Travis-ci上测试不同的区域设置 (GH4918 )。还添加了几个实用程序，用于使用上下文管理器获取区域设置和设置区域设置。
修复了段故障打开 isnull(MultiIndex) (现在改为引发错误) (GH5123 ， GH5125 )
执行对齐操作时允许重复索引 (GH5185 ， GH5639 )
Compound dtypes in a constructor raise NotImplementedError (GH5191)
比较重复帧时出现错误 (GH4421 )相关
描述重复帧时出现错误
窃听 to_datetime 具有一种格式和 coerce=True 不是募捐 (GH5195 )
窃听 loc 需要播出的系列剧的多个索引器和RHS设置 (GH5206 )
修复了级别或标签的就地设置错误 MultiIndex 不会清除缓存 values 属性，因此返回错误 values 。 (GH5215 )
修复了过滤分组的DataFrame或系列不保持原始顺序的错误 (GH4621 )。
固定的 Period 业务日期频率，如果是在非业务日期，则始终前滚。 (GH5203 )
修复了Excel编写器中的错误，其中具有重复列名的框架未正确写入。 (GH5235 )
修复了的问题 drop 和级数上的一个非唯一索引 (GH5248 )
修复了C解析器中因传递的名称多于文件中的列而导致的段错误。 (GH5156 )
修复 Series.isin 具有类似日期/时间的数据类型 (GH5021 )
C和Python解析器现在可以处理更常见的多索引列格式，该格式没有索引名称行 (GH4702 )
尝试将越界日期用作对象数据类型时出现错误 (GH5312 )
尝试显示嵌入的PandasObject时出现错误 (GH5324 )
如果结果与越界相关，则允许操作时间戳以返回DateTime (GH5312 )
Fix return value/type signature of initObjToJSON() to be compatible with numpy's import_array() (GH5334, GH5326)
在DataFrame上重命名然后设置_index时出现错误 (GH5344 )
测试套件在测试图形时不再留下临时文件。 (GH5347 )(感谢您捕捉到这位@yarikopo！)
修复了Win32上的html测试。 (GH4580 )
确保 head/tail 是 iloc 基于， (GH5370 )
修复了的错误 PeriodIndex 字符串表示(如果有1个或2个元素)。 (GH5372 )
GroupBy方法 transform 和 filter 可用于具有重复(非唯一)索引的Series和DataFrame。 (GH4620 )
修复了Repr中不打印名称的空系列 (GH4651 )
默认情况下，使测试在临时目录中创建临时文件。 (GH5419 )
pd.to_timedelta 返回一个标量。 (GH5410 )
pd.to_timedelta 接受 NaN 和 NaT ，返回 NaT 与其提高 (GH5437 )
在以下方面的性能改进 isnull 关于较大尺寸的Pandas物体
修正了与索引器没有匹配长度的1D ndarray的各种类型 (GH5508 )
Bug in getitem with a MultiIndex and iloc (GH5528)
系列上的数据项中存在错误 (GH5542 )
使用自定义函数且对象未发生突变时应用中的错误修复 (GH5545 )
Bug in selecting from a non-unique index with loc (GH5553)
当用户函数返回一个 None ， (GH5592 )
Work around regression in numpy 1.7.0 which erroneously raises IndexError from ndarray.item (GH5666)
具有结果非唯一索引的对象的重复索引错误 (GH5678 )
使用系列和通过的系列/词典填充NA中的错误 (GH5703 )
使用类似日期时间的分组程序进行GROUPBY转换时出现错误 (GH5712 )
使用某些键时，PY3中的多索引选择出现错误 (GH5725 )
不同数据类型的行级合并在某些情况下失败 (GH5754 )

贡献者#

共有77人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Agustín Herranz +
Alex Gaudio +
Alex Rothberg +
Andreas Klostermann +
Andreas Würl +
Andy Hayden
Ben Alex +
Benedikt Sauer +
Brad Buran
Caleb Epstein +
Chang She
Christopher Whelan
DSM +
Dale Jung +
Dan Birken
David Rasch +
Dieter Vandenbussche
Gabi Davar +
Garrett Drapala
Goyo +
Greg Reda +
Ivan Smirnov +
Jack Kelly +
Jacob Schaer +
Jan Schulz +
Jeff Tratner
Jeffrey Tratner
John McNamara +
John W. O'Brien +
Joris Van den Bossche
Justin Bozonier +
Kelsey Jordahl
Kevin Stone
Kieran O'Mahony
Kyle Hausmann +
Kyle Kelley +
Kyle Meyer
Mike Kelly
Mortada Mehyar +
Nick Foti +
Olivier Harris +
Ondřej Čertík +
PKEuS
Phillip Cloud
Pierre Haessig +
Richard T. Guy +
Roman Pekar +
Roy Hyunjin Han
Skipper Seabold
Sten +
Thomas A Caswell +
Thomas Kluyver
Tiago Requeijo +
TomAugspurger
Trent Hauck
Valentin Haenel +
Viktor Kerkez +
Vincent Arel-Bundock
Wes McKinney
Wes Turner +
Weston Renoud +
Yaroslav Halchenko
Zach Dwiel +
chapman siu +
chappers +
d10genes +
danielballan
daydreamt +
engstrom +
jreback
monicaBee +
prossahl +
rockg +
unutbu +
westurner +
y-p
zach powers

版本0.13.1(2014年2月3日)

版本0.12.0(2013年7月24日)