版本0.13.0(2014年1月3日)#

这是从0.12.0开始的一个主要版本,包括许多API更改、几个新功能和增强功能以及大量的错误修复。

亮点包括:

  • 支持新的索引类型 Float64Index 和其他索引增强功能

  • HDFStore 有一种新的基于字符串的查询规范语法

  • 支持新的插补方法

  • 已更新 timedelta 运营

  • 一种新的字符串操作方法 extract

  • 对偏移量的纳秒支持

  • isin 对于DataFrames

添加了几个试验性功能,包括:

  • 新的 eval/query 表达式求值的方法

  • 支持 msgpack 序列化

  • 谷歌的I/O接口 BigQuery

它们是几个新的或更新的文档部分,包括:

警告

在0.13.0中 Series 已在内部重构为不再是子类 ndarray 取而代之的是子类 NDFrame ,与其他Pandas容器相似。这应该是一个透明的更改,只会对API造成非常有限的影响。看见 Internal Refactoring

API更改#

  • read_excel 现在支持在其 sheetname 提供要读入的工作表的索引的参数 (GH4301 )。

  • 文本解析器现在可以处理任何类似inf(“inf”、“inf”、“-inf”、“inf”等)的内容。无穷无尽。 (GH4220GH4219 ),影响 read_tableread_csv 等。

  • pandas 多亏了@jtratner,现在可以兼容Python2/3,而不需要2to3。因此,Pandas现在更广泛地使用迭代器。这也导致了本杰明·彼得森的 six 将类库转换为计算机。 (GH4384GH4375GH4372 )

  • pandas.util.compatpandas.util.py3compat 已经被合并到 pandas.compatpandas.compat 现在包括许多允许2/3兼容性的功能。它包含Range、Filter、map和Zip的列表和迭代器版本,以及与Python3兼容的其他必要元素。 lmaplziplrangelfilter 它们都生成列表,而不是迭代器,以与 numpy 、下标和 pandas 构造函数。 (GH4384GH4375GH4372 )

  • Series.get with negative indexers now returns the same as [] (GH4390)

  • 更改方式 IndexMultiIndex 处理元数据 (levelslabels ,以及 names ) (GH4039 ):

    # previously, you would have set levels or labels directly
    >>> pd.index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]]
    
    # now, you use the set_levels or set_labels methods
    >>> index = pd.index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]])
    
    # similarly, for names, you can rename the object
    # but setting names is not deprecated
    >>> index = pd.index.set_names(["bob", "cranberry"])
    
    # and all methods take an inplace kwarg - but return None
    >>> pd.index.set_names(["bob", "cranberry"], inplace=True)
    
  • All 除法与 NDFrame 对象现在是 真正的划分 ,而不考虑未来的进口。这意味着默认情况下,对Pandas对象的操作将使用 浮点数 除法,并返回浮点数据类型。您可以使用 //floordiv 来做整数除法。

    整数除法

    In [3]: arr = np.array([1, 2, 3, 4])
    
    In [4]: arr2 = np.array([5, 3, 2, 1])
    
    In [5]: arr / arr2
    Out[5]: array([0, 0, 1, 4])
    
    In [6]: pd.Series(arr) // pd.Series(arr2)
    Out[6]:
    0    0
    1    0
    2    1
    3    4
    dtype: int64
    

    真正的划分

    In [7]: pd.Series(arr) / pd.Series(arr2)  # no future import required
    Out[7]:
    0    0.200000
    1    0.666667
    2    1.500000
    3    4.000000
    dtype: float64
    
  • Infer and downcast dtype if downcast='infer' is passed to fillna/ffill/bfill (GH4604)

  • __nonzero__ 对于所有NDFrame对象,现在将引发 ValueError ,这将恢复到 (GH1073GH4633 )行为。看见 gotchas 以进行更详细的讨论。

    这会阻止对 Pandas对象,这在本质上是模棱两可的。所有这些都将引发一个 ValueError

    >>> df = pd.DataFrame({'A': np.random.randn(10),
    ...                    'B': np.random.randn(10),
    ...                    'C': pd.date_range('20130101', periods=10)
    ...                    })
    ...
    >>> if df:
    ...     pass
    ...
    Traceback (most recent call last):
        ...
    ValueError: The truth value of a DataFrame is ambiguous.  Use a.empty,
    a.bool(), a.item(), a.any() or a.all().
    
    >>> df1 = df
    >>> df2 = df
    >>> df1 and df2
    Traceback (most recent call last):
        ...
    ValueError: The truth value of a DataFrame is ambiguous.  Use a.empty,
    a.bool(), a.item(), a.any() or a.all().
    
    >>> d = [1, 2, 3]
    >>> s1 = pd.Series(d)
    >>> s2 = pd.Series(d)
    >>> s1 and s2
    Traceback (most recent call last):
        ...
    ValueError: The truth value of a DataFrame is ambiguous.  Use a.empty,
    a.bool(), a.item(), a.any() or a.all().
    

    添加了 .bool() 方法来执行以下操作 NDFrame 对象以便于计算单元素布尔级数:

    In [1]: pd.Series([True]).bool()
    Out[1]: True
    
    In [2]: pd.Series([False]).bool()
    Out[2]: False
    
    In [3]: pd.DataFrame([[True]]).bool()
    Out[3]: True
    
    In [4]: pd.DataFrame([[False]]).bool()
    Out[4]: False
    
  • 所有非索引NDFrame (SeriesDataFramePanelPanel4DSparsePanel 等),现在支持整套算术运算符和算术灵活方法(加、减、乘等)。 SparsePanel 不支持 powmod 使用非标量。 (GH3765 )

  • SeriesDataFrame 现在有一个 mode() 按AXIS/Series计算统计模式的方法。 (GH5367 )

  • 链式分配现在默认情况下会在用户分配给副本时发出警告。可以使用选项更改此设置 mode.chained_assignment ,允许的选项包括 raise/warn/None 。看见 the docs

    In [5]: dfc = pd.DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]})
    
    In [6]: pd.set_option('chained_assignment', 'warn')
    

    如果尝试此操作,将显示以下警告/异常。

    In [7]: dfc.loc[0]['A'] = 1111
    
    Traceback (most recent call last)
       ...
    SettingWithCopyWarning:
       A value is trying to be set on a copy of a slice from a DataFrame.
       Try using .loc[row_index,col_indexer] = value instead
    

    以下是正确的作业方法。

    In [8]: dfc.loc[0, 'A'] = 11
    
    In [9]: dfc
    Out[9]: 
         A  B
    0   11  1
    1  bbb  2
    2  ccc  3
    
  • Panel.reindex has the following call signature Panel.reindex(items=None, major_axis=None, minor_axis=None, **kwargs)

    为了符合其他标准 NDFrame 物体。看见 Internal Refactoring 了解更多信息。

  • Series.argminSeries.argmax 现在别名为 Series.idxminSeries.idxmax 。这些函数返回 索引

    分别为最小元素或最大元素。在0.13.0之前,这些参数将返回最小/最大元素的位置。 (GH6214 )

先前版本的弃用/更改#

这些是在0.12或更早版本中宣布的更改,自0.13.0起生效

  • Remove deprecated Factor (GH3650)

  • Remove deprecated set_printoptions/reset_printoptions (GH3046)

  • Remove deprecated _verbose_info (GH3215)

  • Remove deprecated read_clipboard/to_clipboard/ExcelFile/ExcelWriter from pandas.io.parsers (GH3717) These are available as functions in the main pandas namespace (e.g. pd.read_clipboard)

  • 默认设置为 tupleize_cols 现在是 False 对这两个人都是 to_csvread_csv 。0.12中的公平警告 (GH3604 )

  • 默认设置为 display.max_seq_len 现在是100,而不是 None 。这将激活截断显示(“...”)不同地方的长序列。 (GH3391 )

不推荐使用#

0.13.0中已弃用

  • 弃用 iterkv ,它将在将来的版本中删除(这是用于绕过的迭代项的别名 2to3 的更改)。 (GH4384GH4375GH4372 )

  • 不推荐使用字符串方法 match ,他的角色现在更地道地由 extract 。在未来版本中,默认行为是 match 将变得类似于 contains ,它返回布尔索引器。(他们的区别是严格: match 依赖于 re.match 而当 contains 依赖于 re.search 。)在此版本中,不推荐使用的行为是默认行为,但可以通过关键字参数使用新行为 as_indexer=True

索引API更改#

在0.13之前,无法使用标签索引器 (.loc/.ix )设置未包含在特定轴的索引中的值。 (GH2578 )。看见 the docs

Series 如果这实际上是一个追加操作

In [10]: s = pd.Series([1, 2, 3])

In [11]: s
Out[11]: 
0    1
1    2
2    3
dtype: int64

In [12]: s[5] = 5.

In [13]: s
Out[13]: 
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64
In [14]: dfi = pd.DataFrame(np.arange(6).reshape(3, 2),
   ....:                    columns=['A', 'B'])
   ....: 

In [15]: dfi
Out[15]: 
   A  B
0  0  1
1  2  3
2  4  5

这将在以前 KeyError

In [16]: dfi.loc[:, 'C'] = dfi.loc[:, 'A']

In [17]: dfi
Out[17]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4

这就像是一个 append 手术开始了。

In [18]: dfi.loc[3] = 5

In [19]: dfi
Out[19]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

任意轴上的面板设置操作会将输入与面板对齐

In [20]: p = pd.Panel(np.arange(16).reshape(2, 4, 2),
   ....:              items=['Item1', 'Item2'],
   ....:              major_axis=pd.date_range('2001/1/12', periods=4),
   ....:              minor_axis=['A', 'B'], dtype='float64')
   ....:

In [21]: p
Out[21]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 2 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2001-01-12 00:00:00 to 2001-01-15 00:00:00
Minor_axis axis: A to B

In [22]: p.loc[:, :, 'C'] = pd.Series([30, 32], index=p.items)

In [23]: p
Out[23]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2001-01-12 00:00:00 to 2001-01-15 00:00:00
Minor_axis axis: A to C

In [24]: p.loc[:, :, 'C']
Out[24]:
            Item1  Item2
2001-01-12   30.0   32.0
2001-01-13   30.0   32.0
2001-01-14   30.0   32.0
2001-01-15   30.0   32.0

Float64Index API更改#

  • 添加了新的索引类型, Float64Index 。这将在索引创建过程中传递浮点值时自动创建。这实现了纯基于标签的切片范例,从而使 [],ix,loc 标量索引和切片的工作原理完全相同。看见 the docs , (GH263 )

    构造默认情况下用于浮点类型值。

    In [20]: index = pd.Index([1.5, 2, 3, 4.5, 5])
    
    In [21]: index
    Out[21]: Float64Index([1.5, 2.0, 3.0, 4.5, 5.0], dtype='float64')
    
    In [22]: s = pd.Series(range(5), index=index)
    
    In [23]: s
    Out[23]: 
    1.5    0
    2.0    1
    3.0    2
    4.5    3
    5.0    4
    dtype: int64
    

    标量选择 [],.ix,.loc 将始终基于标签。整数将匹配相等的浮点索引(例如 3 相当于 3.0 )

    In [24]: s[3]
    Out[24]: 2
    
    In [25]: s.loc[3]
    Out[25]: 2
    

    唯一的位置索引是VIA iloc

    In [26]: s.iloc[3]
    Out[26]: 3
    

    未找到的标量索引将引发 KeyError

    切片始终位于索引值上,对于 [],ix,loc 并且总是定位于 iloc

    In [27]: s[2:4]
    Out[27]: 
    2.0    1
    3.0    2
    dtype: int64
    
    In [28]: s.loc[2:4]
    Out[28]: 
    2.0    1
    3.0    2
    dtype: int64
    
    In [29]: s.iloc[2:4]
    Out[29]: 
    3.0    2
    4.5    3
    dtype: int64
    

    在浮点型索引中,允许使用浮点型进行切片

    In [30]: s[2.1:4.6]
    Out[30]: 
    3.0    2
    4.5    3
    dtype: int64
    
    In [31]: s.loc[2.1:4.6]
    Out[31]: 
    3.0    2
    4.5    3
    dtype: int64
    
  • 保留对其他索引类型的索引(以及位置回退 [],ix ),唯一例外的是对非 Float64Index 现在将引发一个 TypeError

    In [1]: pd.Series(range(5))[3.5]
    TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
    
    In [1]: pd.Series(range(5))[3.5:4.5]
    TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
    

    在未来的版本中,不建议使用标量浮点索引器,但目前允许使用。

    In [3]: pd.Series(range(5))[3.0]
    Out[3]: 3
    

HDFStore API更改#

  • 查询格式更改。现在支持更像字符串的查询格式。看见 the docs

    In [32]: path = 'test.h5'
    
    In [33]: dfq = pd.DataFrame(np.random.randn(10, 4),
       ....:                    columns=list('ABCD'),
       ....:                    index=pd.date_range('20130101', periods=10))
       ....: 
    
    In [34]: dfq.to_hdf(path, 'dfq', format='table', data_columns=True)
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
        138 try:
    --> 139     module = importlib.import_module(name)
        140 except ImportError:
    
    File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
        125         level += 1
    --> 126 return _bootstrap._gcd_import(name[level:], package, level)
    
    File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
    
    File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
    
    File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
    
    ModuleNotFoundError: No module named 'tables'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    Input In [34], in <cell line: 1>()
    ----> 1 dfq.to_hdf(path, 'dfq', format='table', data_columns=True)
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
       2651 from pandas.io import pytables
       2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
       2654 # "Union[DataFrame, Series]" [arg-type]
    -> 2655 pytables.to_hdf(
       2656     path_or_buf,
       2657     key,
       2658     self,  # type: ignore[arg-type]
       2659     mode=mode,
       2660     complevel=complevel,
       2661     complib=complib,
       2662     append=append,
       2663     format=format,
       2664     index=index,
       2665     min_itemsize=min_itemsize,
       2666     nan_rep=nan_rep,
       2667     dropna=dropna,
       2668     data_columns=data_columns,
       2669     errors=errors,
       2670     encoding=encoding,
       2671 )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
        310 path_or_buf = stringify_path(path_or_buf)
        311 if isinstance(path_or_buf, str):
    --> 312     with HDFStore(
        313         path_or_buf, mode=mode, complevel=complevel, complib=complib
        314     ) as store:
        315         f(store)
        316 else:
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
        570 if "format" in kwargs:
        571     raise ValueError("format is not a defined argument for HDFStore")
    --> 573 tables = import_optional_dependency("tables")
        575 if complib is not None and complib not in tables.filters.all_complibs:
        576     raise ValueError(
        577         f"complib only supports {tables.filters.all_complibs} compression."
        578     )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
        140 except ImportError:
        141     if errors == "raise":
    --> 142         raise ImportError(msg)
        143     else:
        144         return None
    
    ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
    

    使用布尔表达式,并进行内联函数求值。

    In [35]: pd.read_hdf(path, 'dfq',
       ....:             where="index>Timestamp('20130104') & columns=['A', 'B']")
       ....: 
    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
    Input In [35], in <cell line: 1>()
    ----> 1 pd.read_hdf(path, 'dfq',
          2             where="index>Timestamp('20130104') & columns=['A', 'B']")
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:428, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs)
        425     exists = False
        427 if not exists:
    --> 428     raise FileNotFoundError(f"File {path_or_buf} does not exist")
        430 store = HDFStore(path_or_buf, mode=mode, errors=errors, **kwargs)
        431 # can't auto open/close if we are using an iterator
        432 # so delegate to the iterator
    
    FileNotFoundError: File test.h5 does not exist
    

    使用内联列引用

    In [36]: pd.read_hdf(path, 'dfq',
       ....:             where="A>0 or C>0")
       ....: 
    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
    Input In [36], in <cell line: 1>()
    ----> 1 pd.read_hdf(path, 'dfq',
          2             where="A>0 or C>0")
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:428, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs)
        425     exists = False
        427 if not exists:
    --> 428     raise FileNotFoundError(f"File {path_or_buf} does not exist")
        430 store = HDFStore(path_or_buf, mode=mode, errors=errors, **kwargs)
        431 # can't auto open/close if we are using an iterator
        432 # so delegate to the iterator
    
    FileNotFoundError: File test.h5 does not exist
    
  • 这个 format 关键字现在取代了 table 关键字;允许值为 fixed(f)table(t) 与之前<0.13.0相同的缺省值保持不变,例如 put 暗示 fixed 格式化和 append 暗示 table 格式化。通过设置,可以将此默认格式设置为选项 io.hdf.default_format

    In [37]: path = 'test.h5'
    
    In [38]: df = pd.DataFrame(np.random.randn(10, 2))
    
    In [39]: df.to_hdf(path, 'df_table', format='table')
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
        138 try:
    --> 139     module = importlib.import_module(name)
        140 except ImportError:
    
    File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
        125         level += 1
    --> 126 return _bootstrap._gcd_import(name[level:], package, level)
    
    File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
    
    File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
    
    File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
    
    ModuleNotFoundError: No module named 'tables'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    Input In [39], in <cell line: 1>()
    ----> 1 df.to_hdf(path, 'df_table', format='table')
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
       2651 from pandas.io import pytables
       2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
       2654 # "Union[DataFrame, Series]" [arg-type]
    -> 2655 pytables.to_hdf(
       2656     path_or_buf,
       2657     key,
       2658     self,  # type: ignore[arg-type]
       2659     mode=mode,
       2660     complevel=complevel,
       2661     complib=complib,
       2662     append=append,
       2663     format=format,
       2664     index=index,
       2665     min_itemsize=min_itemsize,
       2666     nan_rep=nan_rep,
       2667     dropna=dropna,
       2668     data_columns=data_columns,
       2669     errors=errors,
       2670     encoding=encoding,
       2671 )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
        310 path_or_buf = stringify_path(path_or_buf)
        311 if isinstance(path_or_buf, str):
    --> 312     with HDFStore(
        313         path_or_buf, mode=mode, complevel=complevel, complib=complib
        314     ) as store:
        315         f(store)
        316 else:
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
        570 if "format" in kwargs:
        571     raise ValueError("format is not a defined argument for HDFStore")
    --> 573 tables = import_optional_dependency("tables")
        575 if complib is not None and complib not in tables.filters.all_complibs:
        576     raise ValueError(
        577         f"complib only supports {tables.filters.all_complibs} compression."
        578     )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
        140 except ImportError:
        141     if errors == "raise":
    --> 142         raise ImportError(msg)
        143     else:
        144         return None
    
    ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
    
    In [40]: df.to_hdf(path, 'df_table2', append=True)
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
        138 try:
    --> 139     module = importlib.import_module(name)
        140 except ImportError:
    
    File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
        125         level += 1
    --> 126 return _bootstrap._gcd_import(name[level:], package, level)
    
    File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
    
    File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
    
    File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
    
    ModuleNotFoundError: No module named 'tables'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    Input In [40], in <cell line: 1>()
    ----> 1 df.to_hdf(path, 'df_table2', append=True)
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
       2651 from pandas.io import pytables
       2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
       2654 # "Union[DataFrame, Series]" [arg-type]
    -> 2655 pytables.to_hdf(
       2656     path_or_buf,
       2657     key,
       2658     self,  # type: ignore[arg-type]
       2659     mode=mode,
       2660     complevel=complevel,
       2661     complib=complib,
       2662     append=append,
       2663     format=format,
       2664     index=index,
       2665     min_itemsize=min_itemsize,
       2666     nan_rep=nan_rep,
       2667     dropna=dropna,
       2668     data_columns=data_columns,
       2669     errors=errors,
       2670     encoding=encoding,
       2671 )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
        310 path_or_buf = stringify_path(path_or_buf)
        311 if isinstance(path_or_buf, str):
    --> 312     with HDFStore(
        313         path_or_buf, mode=mode, complevel=complevel, complib=complib
        314     ) as store:
        315         f(store)
        316 else:
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
        570 if "format" in kwargs:
        571     raise ValueError("format is not a defined argument for HDFStore")
    --> 573 tables = import_optional_dependency("tables")
        575 if complib is not None and complib not in tables.filters.all_complibs:
        576     raise ValueError(
        577         f"complib only supports {tables.filters.all_complibs} compression."
        578     )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
        140 except ImportError:
        141     if errors == "raise":
    --> 142         raise ImportError(msg)
        143     else:
        144         return None
    
    ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
    
    In [41]: df.to_hdf(path, 'df_fixed')
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
        138 try:
    --> 139     module = importlib.import_module(name)
        140 except ImportError:
    
    File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
        125         level += 1
    --> 126 return _bootstrap._gcd_import(name[level:], package, level)
    
    File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
    
    File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
    
    File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
    
    ModuleNotFoundError: No module named 'tables'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    Input In [41], in <cell line: 1>()
    ----> 1 df.to_hdf(path, 'df_fixed')
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
       2651 from pandas.io import pytables
       2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected
       2654 # "Union[DataFrame, Series]" [arg-type]
    -> 2655 pytables.to_hdf(
       2656     path_or_buf,
       2657     key,
       2658     self,  # type: ignore[arg-type]
       2659     mode=mode,
       2660     complevel=complevel,
       2661     complib=complib,
       2662     append=append,
       2663     format=format,
       2664     index=index,
       2665     min_itemsize=min_itemsize,
       2666     nan_rep=nan_rep,
       2667     dropna=dropna,
       2668     data_columns=data_columns,
       2669     errors=errors,
       2670     encoding=encoding,
       2671 )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
        310 path_or_buf = stringify_path(path_or_buf)
        311 if isinstance(path_or_buf, str):
    --> 312     with HDFStore(
        313         path_or_buf, mode=mode, complevel=complevel, complib=complib
        314     ) as store:
        315         f(store)
        316 else:
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
        570 if "format" in kwargs:
        571     raise ValueError("format is not a defined argument for HDFStore")
    --> 573 tables = import_optional_dependency("tables")
        575 if complib is not None and complib not in tables.filters.all_complibs:
        576     raise ValueError(
        577         f"complib only supports {tables.filters.all_complibs} compression."
        578     )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
        140 except ImportError:
        141     if errors == "raise":
    --> 142         raise ImportError(msg)
        143     else:
        144         return None
    
    ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
    
    In [42]: with pd.HDFStore(path) as store:
       ....:     print(store)
       ....: 
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
        138 try:
    --> 139     module = importlib.import_module(name)
        140 except ImportError:
    
    File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
        125         level += 1
    --> 126 return _bootstrap._gcd_import(name[level:], package, level)
    
    File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
    
    File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
    
    File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
    
    ModuleNotFoundError: No module named 'tables'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    Input In [42], in <cell line: 1>()
    ----> 1 with pd.HDFStore(path) as store:
          2     print(store)
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
        570 if "format" in kwargs:
        571     raise ValueError("format is not a defined argument for HDFStore")
    --> 573 tables = import_optional_dependency("tables")
        575 if complib is not None and complib not in tables.filters.all_complibs:
        576     raise ValueError(
        577         f"complib only supports {tables.filters.all_complibs} compression."
        578     )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
        140 except ImportError:
        141     if errors == "raise":
    --> 142         raise ImportError(msg)
        143     else:
        144         return None
    
    ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
    
  • 显著提高了表写入性能

  • 处理通过的 Series 以表格形式显示 (GH4330 )

  • 现在可以序列化 timedelta64[ns] 表格中的数据类型 (GH3577 ),请参阅 the docs

  • 添加了一个 is_open 属性以指示基础文件句柄是否为_OPEN;已关闭的存储区现在将在查看存储区时报告“已关闭”(而不是引发错误) (GH4409 )

  • 结束了一场 HDFStore 现在将关闭 HDFStore 但仅当引用计数时才关闭实际文件(按 PyTables )W.r.t.所有打开的控制柄都为0。实际上,您有一个本地实例 HDFStore 由变量引用。一旦关闭,它将报告已关闭。其他引用(对同一文件)将继续运行,直到它们本身被关闭。对已关闭的文件执行操作将引发 ClosedFileError

    In [43]: path = 'test.h5'
    
    In [44]: df = pd.DataFrame(np.random.randn(10, 2))
    
    In [45]: store1 = pd.HDFStore(path)
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
        138 try:
    --> 139     module = importlib.import_module(name)
        140 except ImportError:
    
    File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
        125         level += 1
    --> 126 return _bootstrap._gcd_import(name[level:], package, level)
    
    File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
    
    File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
    
    File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
    
    ModuleNotFoundError: No module named 'tables'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    Input In [45], in <cell line: 1>()
    ----> 1 store1 = pd.HDFStore(path)
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
        570 if "format" in kwargs:
        571     raise ValueError("format is not a defined argument for HDFStore")
    --> 573 tables = import_optional_dependency("tables")
        575 if complib is not None and complib not in tables.filters.all_complibs:
        576     raise ValueError(
        577         f"complib only supports {tables.filters.all_complibs} compression."
        578     )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
        140 except ImportError:
        141     if errors == "raise":
    --> 142         raise ImportError(msg)
        143     else:
        144         return None
    
    ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
    
    In [46]: store2 = pd.HDFStore(path)
    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
        138 try:
    --> 139     module = importlib.import_module(name)
        140 except ImportError:
    
    File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
        125         level += 1
    --> 126 return _bootstrap._gcd_import(name[level:], package, level)
    
    File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
    
    File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
    
    File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)
    
    ModuleNotFoundError: No module named 'tables'
    
    During handling of the above exception, another exception occurred:
    
    ImportError                               Traceback (most recent call last)
    Input In [46], in <cell line: 1>()
    ----> 1 store2 = pd.HDFStore(path)
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)
        570 if "format" in kwargs:
        571     raise ValueError("format is not a defined argument for HDFStore")
    --> 573 tables = import_optional_dependency("tables")
        575 if complib is not None and complib not in tables.filters.all_complibs:
        576     raise ValueError(
        577         f"complib only supports {tables.filters.all_complibs} compression."
        578     )
    
    File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
        140 except ImportError:
        141     if errors == "raise":
    --> 142         raise ImportError(msg)
        143     else:
        144         return None
    
    ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
    
    In [47]: store1.append('df', df)
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [47], in <cell line: 1>()
    ----> 1 store1.append('df', df)
    
    NameError: name 'store1' is not defined
    
    In [48]: store2.append('df2', df)
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [48], in <cell line: 1>()
    ----> 1 store2.append('df2', df)
    
    NameError: name 'store2' is not defined
    
    In [49]: store1
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [49], in <cell line: 1>()
    ----> 1 store1
    
    NameError: name 'store1' is not defined
    
    In [50]: store2
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [50], in <cell line: 1>()
    ----> 1 store2
    
    NameError: name 'store2' is not defined
    
    In [51]: store1.close()
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [51], in <cell line: 1>()
    ----> 1 store1.close()
    
    NameError: name 'store1' is not defined
    
    In [52]: store2
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [52], in <cell line: 1>()
    ----> 1 store2
    
    NameError: name 'store2' is not defined
    
    In [53]: store2.close()
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [53], in <cell line: 1>()
    ----> 1 store2.close()
    
    NameError: name 'store2' is not defined
    
    In [54]: store2
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Input In [54], in <cell line: 1>()
    ----> 1 store2
    
    NameError: name 'store2' is not defined
    
  • 删除了 _quiet 属性,替换为 DuplicateWarning 如果从表中检索重复行 (GH4367 )

  • 删除了 warn 论据来自 open 。相反,一个 PossibleDataLossError 如果您尝试使用 mode='w' 使用打开的文件句柄 (GH4367 )

  • 允许将传递的位置数组或掩码作为 where 条件 (GH4467 )。看见 the docs 举个例子。

  • add the keyword dropna=True to append to change whether ALL nan rows are not written to the store (default is True, ALL nan rows are NOT written), also settable via the option io.hdf.dropna_table (GH4625)

  • 传递存储创建参数;可用于支持内存中存储

DataFrame恢复更改#

的HTML和纯文本表示形式 DataFrame 现在,在表超过特定大小时显示截断的表视图,而不是切换到Short INFO视图 (GH4886GH5550 )。随着小DataFrame变得更大,这使得表示更一致。

DataFrame的截断的HTML表示形式

要获取信息视图,请调用 DataFrame.info() 。如果您更喜欢将信息视图作为大型DataFrame的REPR,则可以通过运行 set_option('display.large_repr', 'info')

增强#

  • df.to_clipboard() 学到了一个新的 excel 关键字,用于将DF数据直接粘贴到EXCEL中(默认情况下启用)。 (GH5070 )。

  • read_html now raises a URLError instead of catching and raising a ValueError (GH4303, GH4305)

  • Added a test for read_clipboard() and to_clipboard() (GH4282)

  • 剪贴板功能现在可以与PySide一起使用 (GH4282 )

  • 当打印参数包含重叠的颜色和样式参数时,添加了信息量更大的错误消息 (GH4402 )

  • to_dict 现在需要 records 作为一个可能的外向型。返回列键控词典的数组。 (GH4936 )

  • NaN handing in get_dummies (GH4446) with dummy_na

    # previously, nan was erroneously counted as 2 here
    # now it is not counted at all
    In [55]: pd.get_dummies([1, 2, np.nan])
    Out[55]: 
       1.0  2.0
    0    1    0
    1    0    1
    2    0    0
    
    # unless requested
    In [56]: pd.get_dummies([1, 2, np.nan], dummy_na=True)
    Out[56]: 
       1.0  2.0  NaN
    0    1    0    0
    1    0    1    0
    2    0    0    1
    
  • timedelta64[ns] 运营部。看见 the docs

    警告

    这些操作中的大多数需要 numpy >= 1.7

    使用新的顶层 to_timedelta ,您可以从标准时间增量格式(由生成)转换标量或数组 to_csv )转换为时间增量类型 (np.timedelta64 在……里面 nanoseconds )。

    In [57]: pd.to_timedelta('1 days 06:05:01.00003')
    Out[57]: Timedelta('1 days 06:05:01.000030')
    
    In [58]: pd.to_timedelta('15.5us')
    Out[58]: Timedelta('0 days 00:00:00.000015500')
    
    In [59]: pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan'])
    Out[59]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None)
    
    In [60]: pd.to_timedelta(np.arange(5), unit='s')
    Out[60]: 
    TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02',
                    '0 days 00:00:03', '0 days 00:00:04'],
                   dtype='timedelta64[ns]', freq=None)
    
    In [61]: pd.to_timedelta(np.arange(5), unit='d')
    Out[61]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
    

    一系列数据类型 timedelta64[ns] 现在可以被另一个除以 timedelta64[ns] 对象,或进行astype以生成 float64 Dtype系列。这是一种频率转换。看见 the docs 为了医生。

    In [62]: import datetime
    
    In [63]: td = pd.Series(pd.date_range('20130101', periods=4)) - pd.Series(
       ....:     pd.date_range('20121201', periods=4))
       ....: 
    
    In [64]: td[2] += np.timedelta64(datetime.timedelta(minutes=5, seconds=3))
    
    In [65]: td[3] = np.nan
    
    In [66]: td
    Out[66]: 
    0   31 days 00:00:00
    1   31 days 00:00:00
    2   31 days 00:05:03
    3                NaT
    dtype: timedelta64[ns]
    
    # to days
    In [67]: td / np.timedelta64(1, 'D')
    Out[67]: 
    0    31.000000
    1    31.000000
    2    31.003507
    3          NaN
    dtype: float64
    
    In [68]: td.astype('timedelta64[D]')
    Out[68]: 
    0    31.0
    1    31.0
    2    31.0
    3     NaN
    dtype: float64
    
    # to seconds
    In [69]: td / np.timedelta64(1, 's')
    Out[69]: 
    0    2678400.0
    1    2678400.0
    2    2678703.0
    3          NaN
    dtype: float64
    
    In [70]: td.astype('timedelta64[s]')
    Out[70]: 
    0    2678400.0
    1    2678400.0
    2    2678703.0
    3          NaN
    dtype: float64
    

    除以或乘以 timedelta64[ns] 按整数级数或整数级数

    In [71]: td * -1
    Out[71]: 
    0   -31 days +00:00:00
    1   -31 days +00:00:00
    2   -32 days +23:54:57
    3                  NaT
    dtype: timedelta64[ns]
    
    In [72]: td * pd.Series([1, 2, 3, 4])
    Out[72]: 
    0   31 days 00:00:00
    1   62 days 00:00:00
    2   93 days 00:15:09
    3                NaT
    dtype: timedelta64[ns]
    

    绝对的 DateOffset 对象可以等同于 timedeltas

    In [73]: from pandas import offsets
    
    In [74]: td + offsets.Minute(5) + offsets.Milli(5)
    Out[74]: 
    0   31 days 00:05:00.005000
    1   31 days 00:05:00.005000
    2   31 days 00:10:03.005000
    3                       NaT
    dtype: timedelta64[ns]
    

    现在,时间增量支持Fillna

    In [75]: td.fillna(pd.Timedelta(0))
    Out[75]: 
    0   31 days 00:00:00
    1   31 days 00:00:00
    2   31 days 00:05:03
    3    0 days 00:00:00
    dtype: timedelta64[ns]
    
    In [76]: td.fillna(datetime.timedelta(days=1, seconds=5))
    Out[76]: 
    0   31 days 00:00:00
    1   31 days 00:00:00
    2   31 days 00:05:03
    3    1 days 00:00:05
    dtype: timedelta64[ns]
    

    您可以对时间增量执行数值归约操作。

    In [77]: td.mean()
    Out[77]: Timedelta('31 days 00:01:41')
    
    In [78]: td.quantile(.1)
    Out[78]: Timedelta('31 days 00:00:00')
    
  • plot(kind='kde') 现在接受可选参数 bw_methodind 分别传递给scipy.stats.gauskde()(对于scipy>=0.11.0)以设置带宽,并传递给gkde.valuate()以指定评估带宽的指数。请参见Scipy Docs。 (GH4298 )

  • DataFrame构造函数现在接受数字掩码记录数组 (GH3478 )

  • 一种新的矢量化弦方法 extract 更方便地返回正则表达式匹配。

    In [79]: pd.Series(['a1', 'b2', 'c3']).str.extract('[ab](\\d)')
    Out[79]: 
         0
    0    1
    1    2
    2  NaN
    

    不匹配的元素返回 NaN 。提取包含多个组的正则表达式将返回每个组包含一列的DataFrame。

    In [80]: pd.Series(['a1', 'b2', 'c3']).str.extract('([ab])(\\d)')
    Out[80]: 
         0    1
    0    a    1
    1    b    2
    2  NaN  NaN
    

    不匹配的元素返回一行 NaN 。因此,一系列杂乱无章的字符串可以 已转换 转化为经过清理或更有用的字符串的类似索引的Series或DataFrame,而不需要 get() 要访问元组或 re.match 对象。

    命名组,如

    In [81]: pd.Series(['a1', 'b2', 'c3']).str.extract(
       ....:     '(?P<letter>[ab])(?P<digit>\\d)')
       ....: 
    Out[81]: 
      letter digit
    0      a     1
    1      b     2
    2    NaN   NaN
    

    也可以使用可选的组。

    In [82]: pd.Series(['a1', 'b2', '3']).str.extract(
       ....:      '(?P<letter>[ab])?(?P<digit>\\d)')
       ....: 
    Out[82]: 
      letter digit
    0      a     1
    1      b     2
    2    NaN     3
    
  • read_stata 现在接受Stata 13格式 (GH4291 )

  • read_fwf 现在,如果数据使用提供给函数的分隔符正确分隔并正确对齐列,则会从文件的前100行推断列规范 (GH4488 )。

  • 支持纳秒时间作为偏移量

    警告

    这些操作需要 numpy >= 1.7

    对秒及以下范围内的周期转换进行了修改,并将其扩展到纳秒。现在可以使用纳秒范围内的周期。

    In [83]: pd.date_range('2013-01-01', periods=5, freq='5N')
    Out[83]: 
    DatetimeIndex([          '2013-01-01 00:00:00',
                   '2013-01-01 00:00:00.000000005',
                   '2013-01-01 00:00:00.000000010',
                   '2013-01-01 00:00:00.000000015',
                   '2013-01-01 00:00:00.000000020'],
                  dtype='datetime64[ns]', freq='5N')
    

    或以频率作为偏移量

    In [84]: pd.date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5))
    Out[84]: 
    DatetimeIndex([          '2013-01-01 00:00:00',
                   '2013-01-01 00:00:00.000000005',
                   '2013-01-01 00:00:00.000000010',
                   '2013-01-01 00:00:00.000000015',
                   '2013-01-01 00:00:00.000000020'],
                  dtype='datetime64[ns]', freq='5N')
    

    时间戳可以在纳秒范围内修改

    In [85]: t = pd.Timestamp('20130101 09:01:02')
    
    In [86]: t + pd.tseries.offsets.Nano(123)
    Out[86]: Timestamp('2013-01-01 09:01:02.000000123')
    
  • 一种新的方法, isin 对于DataFrames,它与布尔索引配合得很好。的论据 isin ,我们要将DataFrame与之进行比较,可以是DataFrame、Series、Dict或值数组。看见 the docs 想要更多。

    要获取满足任何条件的行,请执行以下操作:

    In [87]: dfi = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']})
    
    In [88]: dfi
    Out[88]: 
       A  B
    0  1  a
    1  2  b
    2  3  f
    3  4  n
    
    In [89]: other = pd.DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']})
    
    In [90]: mask = dfi.isin(other)
    
    In [91]: mask
    Out[91]: 
           A      B
    0   True  False
    1  False  False
    2   True   True
    3  False  False
    
    In [92]: dfi[mask.any(axis=1)]
    Out[92]: 
       A  B
    0  1  a
    2  3  f
    
  • Series 现在支持 to_frame 方法将其转换为单列DataFrame (GH5164 )

  • 现在可以将此处列出的所有R数据集加载到http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html对象中

    # note that pandas.rpy was deprecated in v0.16.0
    import pandas.rpy.common as com
    com.load_data('Titanic')
    
  • tz_localize can infer a fall daylight savings transition based on the structure of the unlocalized data (GH4230), see the docs

  • DatetimeIndex is now in the API documentation, see the docs

  • json_normalize() is a new method to allow you to create a flat table from semi-structured JSON data. See the docs (GH1067)

  • 添加了对qtanda as DataFrameModel和DataFrameWidget的PySide支持。

  • PythonCSV解析器现在支持usecol (GH4335 )

  • 频率增加了几个新的偏移量:

    • LastWeekOfMonth (GH4637)

    • FY5253, and FY5253Quarter (GH4511)

  • DataFrame有了一个新的 interpolate 方法,类似于系列 (GH4434GH1892 )

    In [93]: df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
       ....:                   'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
       ....: 
    
    In [94]: df.interpolate()
    Out[94]: 
         A      B
    0  1.0   0.25
    1  2.1   1.50
    2  3.4   2.75
    3  4.7   4.00
    4  5.6  12.20
    5  6.8  14.40
    

    此外, method 参数为 interpolate 已扩展到包括 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', 'piecewise_polynomial', 'pchip', 'polynomial', 'spline' 新的方法需要 scipy. 请参考Scipy参考资料 guidedocumentation 有关各种方法何时适用的详细信息,请参阅。看见 the docs

    Interpolate Now还接受 limit 关键字参数。其工作原理类似于 fillna 的限制:

    In [95]: ser = pd.Series([1, 3, np.nan, np.nan, np.nan, 11])
    
    In [96]: ser.interpolate(limit=2)
    Out[96]: 
    0     1.0
    1     3.0
    2     5.0
    3     7.0
    4     NaN
    5    11.0
    dtype: float64
    
  • 已添加 wide_to_long 面板数据便捷功能。看见 the docs

    In [97]: np.random.seed(123)
    
    In [98]: df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
       ....:                    "A1980" : {0 : "d", 1 : "e", 2 : "f"},
       ....:                    "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
       ....:                    "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
       ....:                    "X"     : dict(zip(range(3), np.random.randn(3)))
       ....:                   })
       ....: 
    
    In [99]: df["id"] = df.index
    
    In [100]: df
    Out[100]: 
      A1970 A1980  B1970  B1980         X  id
    0     a     d    2.5    3.2 -1.085631   0
    1     b     e    1.2    1.3  0.997345   1
    2     c     f    0.7    0.1  0.282978   2
    
    In [101]: pd.wide_to_long(df, ["A", "B"], i="id", j="year")
    Out[101]: 
                    X  A    B
    id year                  
    0  1970 -1.085631  a  2.5
    1  1970  0.997345  b  1.2
    2  1970  0.282978  c  0.7
    0  1980 -1.085631  d  3.2
    1  1980  0.997345  e  1.3
    2  1980  0.282978  f  0.1
    
  • to_csv 现在需要一个 date_format 关键字参数,指定应如何设置输出DateTime对象的格式。索引、列和值中遇到的日期时间都将应用此格式。 (GH4313 )

  • DataFrame.plot will scatter plot x versus y by passing kind='scatter' (GH2215)

  • 添加了对Google Analytics v3 API段ID的支持,该ID也支持v2 ID。 (GH5271 )

实验性的#

  • 新的 eval() 函数使用以下命令实现表达式求值 numexpr 在幕后。这为涉及大型DataFrame/Series的复杂表达式带来了很大的加速比。例如,

    In [102]: nrows, ncols = 20000, 100
    
    In [103]: df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols))
       .....:                       for _ in range(4)]
       .....: 
    
    # eval with NumExpr backend
    In [104]: %timeit pd.eval('df1 + df2 + df3 + df4')
    7.63 ms +- 41.2 us per loop (mean +- std. dev. of 7 runs, 100 loops each)
    
    # pure Python evaluation
    In [105]: %timeit df1 + df2 + df3 + df4
    6.99 ms +- 49.7 us per loop (mean +- std. dev. of 7 runs, 100 loops each)
    

    有关更多详细信息,请参阅 the docs

  • 类似于 pandas.evalDataFrame 有一个新的 DataFrame.eval 方法来计算 DataFrame 。例如,

    In [106]: df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b'])
    
    In [107]: df.eval('a + b')
    Out[107]: 
    0   -0.685204
    1    1.589745
    2    0.325441
    3   -1.784153
    4   -0.432893
    5    0.171850
    6    1.895919
    7    3.065587
    8   -0.092759
    9    1.391365
    dtype: float64
    
  • query() 方法,该方法允许您选择 DataFrame 使用与Python语法几乎相同的自然查询语法。例如,

    In [108]: n = 20
    
    In [109]: df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c'])
    
    In [110]: df.query('a < b < c')
    Out[110]: 
        a   b   c
    11  1   5   8
    15  8  16  19
    

    选择的所有行 df 哪里 a < b < c 计算结果为 True 。有关更多详细信息,请参阅 the docs

  • pd.read_msgpack() and pd.to_msgpack() are now a supported method of serialization of arbitrary pandas (and python objects) in a lightweight portable binary format. See the docs

    警告

    由于这是一个试验库,因此存储格式可能要到将来的版本才能稳定。

    df = pd.DataFrame(np.random.rand(5, 2), columns=list('AB'))
    df.to_msgpack('foo.msg')
    pd.read_msgpack('foo.msg')
    
    s = pd.Series(np.random.rand(5), index=pd.date_range('20130101', periods=5))
    pd.to_msgpack('foo.msg', df, s)
    pd.read_msgpack('foo.msg')
    

    你可以过去了 iterator=True 对解压缩的结果进行迭代

    for o in pd.read_msgpack('foo.msg', iterator=True):
        print(o)
    
  • pandas.io.gbq provides a simple way to extract from, and load data into, Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high performance SQL-like database service, useful for performing ad-hoc queries against extremely large datasets. See the docs

    from pandas.io import gbq
    
    # A query to select the average monthly temperatures in the
    # in the year 2000 across the USA. The dataset,
    # publicata:samples.gsod, is available on all BigQuery accounts,
    # and is based on NOAA gsod data.
    
    query = """SELECT station_number as STATION,
    month as MONTH, AVG(mean_temp) as MEAN_TEMP
    FROM publicdata:samples.gsod
    WHERE YEAR = 2000
    GROUP BY STATION, MONTH
    ORDER BY STATION, MONTH ASC"""
    
    # Fetch the result set for this query
    
    # Your Google BigQuery Project ID
    # To find this, see your dashboard:
    # https://console.developers.google.com/iam-admin/projects?authuser=0
    projectid = 'xxxxxxxxx'
    df = gbq.read_gbq(query, project_id=projectid)
    
    # Use pandas to process and reshape the dataset
    
    df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP')
    df3 = pd.concat([df2.min(), df2.mean(), df2.max()],
                    axis=1, keys=["Min Tem", "Mean Temp", "Max Temp"])
    

    生成的DataFrame为::

    > df3
                Min Tem  Mean Temp    Max Temp
     MONTH
     1     -53.336667  39.827892   89.770968
     2     -49.837500  43.685219   93.437932
     3     -77.926087  48.708355   96.099998
     4     -82.892858  55.070087   97.317240
     5     -92.378261  61.428117  102.042856
     6     -77.703334  65.858888  102.900000
     7     -87.821428  68.169663  106.510714
     8     -89.431999  68.614215  105.500000
     9     -86.611112  63.436935  107.142856
     10    -78.209677  56.880838   92.103333
     11    -50.125000  48.861228   94.996428
     12    -50.332258  42.286879   94.396774
    

    警告

    要使用此模块,您需要一个BigQuery帐户。详情请参见<https://cloud.google.com/products/big-query>。

    截至2013年10月10日,Google的API中存在一个错误,阻止结果集大于10万行。补丁计划在10/14/13这一周进行。

内部重构#

在0.13.0中,主要对子类进行了重大重构 Series 从… NDFrame ,它是当前 DataFramePanel ,统一方法和行为。以前直接从 ndarray 。 (GH4080GH3862GH816 )

警告

从<0.13.0开始有两个潜在的不兼容

  • 使用某些NumPy函数以前会返回一个 Series 如果传递的是 Series 作为一种争论。这似乎只会影响 np.ones_likenp.empty_likenp.diffnp.where 。这些现在还回来了 ndarrays

    In [111]: s = pd.Series([1, 2, 3, 4])
    

    块状的用法

    In [112]: np.ones_like(s)
    Out[112]: array([1, 1, 1, 1])
    
    In [113]: np.diff(s)
    Out[113]: array([1, 1, 1])
    
    In [114]: np.where(s > 1, s, np.nan)
    Out[114]: array([nan,  2.,  3.,  4.])
    

    Pandonic用法

    In [115]: pd.Series(1, index=s.index)
    Out[115]: 
    0    1
    1    1
    2    1
    3    1
    dtype: int64
    
    In [116]: s.diff()
    Out[116]: 
    0    NaN
    1    1.0
    2    1.0
    3    1.0
    dtype: float64
    
    In [117]: s.where(s > 1)
    Out[117]: 
    0    NaN
    1    2.0
    2    3.0
    3    4.0
    dtype: float64
    
  • 传递一个 Series 直接连接到一个Cython函数,该函数需要 ndarray 类型将不再直接工作,您必须传递 Series.values ,请参阅 Enhancing Performance

  • Series(0.5) would previously return the scalar 0.5, instead this will return a 1-element Series

  • 这一变化打破了 rpy2<=2.3.8 。已针对rpy2打开一个问题,并在中详细介绍了解决方法 GH5698 。感谢@JanSchulz。

  • 对于0.13之前创建的泡菜,将保留泡菜兼容性。这些必须用不腌制的 pd.read_pickle ,请参见 Pickling

  • 重构Series.py/Frame.py/panel.py以将常见代码移动到Generic.py

    • 已添加 _setup_axes 创建常规NDFrame结构

    • 已移动的方法

      • from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop

      • __iter__,keys,__contains__,__len__,__neg__,__invert__

      • convert_objects,as_blocks,as_matrix,values

      • __getstate__,__setstate__ (Compat保留在框架/面板中)

      • __getattr__,__setattr__

      • _indexed_same,reindex_like,align,where,mask

      • fillna,replace (Series replace is now consistent with DataFrame)

      • filter (还添加了轴参数,以选择性地在不同的轴上过滤)

      • reindex,reindex_axis,take

      • truncate (已移动以成为 NDFrame )

  • 这些是API更改,它们使 Panel 更符合 DataFrame

    • swapaxes 在一个 Panel 使用指定的相同轴,现在返回一个副本

    • 支持设置属性访问

    • Filter支持与原始版本相同的接口 DataFrame 过滤

  • 在没有参数的情况下调用reindex现在将返回输入对象的副本

  • TimeSeries 现在是的别名 Series 。该物业 is_time_series 可用于区分(如果需要)

  • 重构稀疏对象以使用BlockManager

    • 在内饰中创建了新的积木类型, SparseBlock ,它可以容纳多个数据类型,并且是不可合并的。 SparseSeriesSparseDataFrame 现在从这些层次结构(Series/DataFrame)继承更多方法,而不再从 SparseArray (相反,它是 SparseBlock )

    • 稀疏套件现在支持与非稀疏数据集成。支持非浮点型稀疏数据(部分实现)

    • 在DataFrames中对稀疏结构的操作应该保持稀疏性,合并类型的操作将转换为密集(然后再转换为稀疏),因此效率可能会有点低

    • 启用设置 SparseSeries 用于布尔/整型/切片

    • SparsePanels 实施保持不变(例如,不使用BlockManager,需要工作)

  • 已添加 ftypes 方法设置为Series/DataFrame,类似于 dtypes ,但指示基础是否为稀疏/密集(以及数据类型)

  • NDFrame 对象现在可以使用 __finalize__() 要指定要从现有对象传播到新对象的各种值(例如 name 在……里面 Series 现在将更自动地跟踪)

  • 内部类型检查现在通过一套生成的类来完成,从而允许 isinstance(value, klass) 无需直接导入Klass,由@jtratner提供

  • 系列更新中的错误,其中父框架没有根据更改更新其缓存 (GH4080 )或类型 (GH3217 )、Fill na (GH3386 )

  • 已修复使用数据类型转换编制索引的问题 (GH4463GH4204 )

  • 重构 Series.reindex 至core/Generic.py (GH4604GH4618 ),允许 method= 在对系列进行重新索引以进行工作时

  • Series.copy 不再接受 order 参数,并且现在与 NDFrame 复制

  • 重构 rename 方法到core/Generic.py;修复 Series.rename 为 (GH4605 ),并添加了 rename 使用相同的签名 Panel

  • 重构 clip 到core/Generic.py的方法 (GH4798 )

  • 重构 _get_numeric_data/_get_bool_data 到core/Generic.py,允许系列/面板功能

  • Series (用于索引)/ Panel (对于项)现在允许对其元素进行属性访问 (GH1903 )

    In [118]: s = pd.Series([1, 2, 3], index=list('abc'))
    
    In [119]: s.b
    Out[119]: 2
    
    In [120]: s.a = 5
    
    In [121]: s
    Out[121]: 
    a    5
    b    2
    c    3
    dtype: int64
    

错误修复#

  • HDFStore

    • 养育一名伤残人士 TypeError 而不是 ValueError 当追加不同的块顺序时 (GH4096 )

    • read_hdf was not respecting as passed mode (GH4504)

    • 追加0-len表将正常工作 (GH4273 )

    • to_hdf was raising when passing both arguments append and table (GH4584)

    • 从具有跨dtype的重复列的存储区中读取将引发 (GH4767 )

    • 修复了以下位置的错误 ValueError 当列名不是字符串时,未正确引发 (GH4956 )

    • 以固定格式编写的零长度序列未正确反序列化。 (GH4708 )

    • 修复了pyt3上的解码性能问题 (GH5441 )

    • 在存储之前验证多索引中的级别 (GH5527 )

    • 正确处理 data_columns 带面板 (GH5717 )

  • Fixed bug in tslib.tz_convert(vals, tz1, tz2): it could raise IndexError exception while trying to access trans[pos + 1] (GH4496)

  • 这个 by 参数现在可以正确使用 layout 论据 (GH4102GH4014 ) *.hist 标绘方法

  • 修复了中的错误 PeriodIndex.map 在哪里使用 str 将返回索引的字符串表示形式 (GH4136 )

  • 修复了测试失败问题 test_time_series_plot_color_with_empty_kwargs 使用自定义matplotlib默认颜色时 (GH4345 )

  • 修复Stata IO测试的运行。现在使用临时文件写入 (GH4353 )

  • 修复了以下问题 DataFrame.sum 比这更慢 DataFrame.mean 对于整数值帧 (GH4365 )

  • read_html 测试现在可与Python2.6配合使用 (GH4351 )

  • 修复了以下错误 network 测试是在抛出 NameError 因为未定义局部变量 (GH4381 )

  • 在……里面 to_json ,如果通过,则举起 orient 会因为重复的索引而导致数据丢失 (GH4359 )

  • 在……里面 to_json ,修复日期处理,因此毫秒是文档字符串所说的默认时间戳 (GH4362 )。

  • as_index 在执行分组应用时不再被忽略 (GH4648GH3417 )

  • JSON NaT handling fixed, NaTs are now serialized to null (GH4498)

  • 修复了JSON对象键中可转义字符的JSON处理 (GH4593 )

  • Fixed passing keep_default_na=False when na_values=None (GH4318)

  • 修复了的错误 values 在具有重复列和混合数据类型的DataFrame上引发错误,出现在 (GH4377 )

  • Fixed bug with duplicate columns and type conversion in read_json when orient='split' (GH4377)

  • 修复了区域设置中使用小数分隔符而不是‘.’的JSON错误在编码/解码某些值时引发异常。 (GH4918 )

  • Fix .iat indexing with a PeriodIndex (GH4390)

  • 修复了以下问题 PeriodIndex 与SELF连接将返回一个新实例,而不是相同的实例 (GH4379 );还为其他索引类型添加了此测试

  • 修复了在使用带有usecols参数的CSV cparser时所有数据类型都被转换为对象的错误 (GH3192 )

  • 修复合并块时出现的问题,其中生成的DataFrame具有部分set_ref_Locs (GH4403 )

  • 修复了使用顶级matplotlib API调用hist子图时会覆盖它们的问题 (GH4408 )

  • 修复了调用 Series.astype(str) 将截断该字符串 (GH4405GH4437 )

  • 修复了字节作为元组重新生成时的py3兼容问题 (GH4455 )

  • 修正了项目命名为‘a’时的面板属性命名冲突 (GH3440 )

  • 修复了打印时引发重复索引的问题 (GH4486 )

  • 修正了Cumsum和Cumprod不适用于bool数据类型的问题 (GH4170GH4440 )

  • 固定面板切片发布于 xs 返回了一个不正确的暗显对象 (GH4016 )

  • 修复了只有一个组时不使用自定义约简功能的重采样错误 (GH3849GH4494 )

  • 带转置框架的固定面板分配 (GH3830 )

  • 在使用Panel和Panel作为需要对齐的值的集合索引时引发 (GH3777 )

  • 冻结集对象现在在 Series 构造函数 (GH4482GH4480 )

  • 修复了对具有多个数据类型的重复多索引进行排序时出现的问题 (GH4516 )

  • 修复了中的错误 DataFrame.set_values 这会导致在扩展索引时丢失名称属性。 (GH3742GH4039 )

  • 修复了以下问题: nameslevelslabels 可以被设置为 MultiIndex 未经验证 (GH3714GH4039 )

  • 固定的 (GH3334 )。如果值是索引,则不计算边距。

  • 修复了RHS为 np.timedelta64np.offsets.DateOffset 使用DateTime操作时 (GH4532 )

  • 使用序列/日期时间索引和 np.timedelta64 工作方式不同 (GH4134 )和NumPy 1.6中的错误时间增量 (GH4135 )

  • 修复错误 pd.read_clipboard 在带有PY3的Windows上 (GH4561 );没有正确解码

  • tslib.get_period_field()tslib.get_period_field_arr() 现在,将If代码参数提高到超出范围 (GH4519GH4520 )

  • 修复空序列上的布尔索引丢失索引名 (GH4235 ),INFER_dtype适用于空数组。

  • 修复了多个轴的重建索引;如果轴匹配没有替换当前轴,则可能会导致延迟频率推断问题 (GH3317 )

  • 修复了以下情况下的问题 DataFrame.apply 错误地重新定位异常(导致原始堆栈跟踪被截断)。

  • 修复所选内容 ix/loc 和非唯一选择符(_U) (GH4619 )

  • 修复涉及现有列中数据类型更改的iloc/loc赋值 (GH4312GH5702 )在核心/索引中有内部setitem_with_indexer以使用Block.setitem

  • 修复了CSV_IMPORT中未正确处理浮点数的千位运算符的错误 (GH4322 )

  • 修复了许多DateOffset未正确使用CacheableOffset的问题;这阻止了DateOffset被缓存 (GH4609 )

  • 修复与LHS上的DataFrame和RHS上的列表/元组的布尔比较 (GH4576 )

  • Fix error/dtype conversion with setitem of None on Series/DataFrame (GH4667)

  • Fix decoding based on a passed in non-default encoding in pd.read_stata (GH4626)

  • 修复 DataFrame.from_records 加一种普通的香草 ndarray 。 (GH4727 )

  • 修复某些不一致之处 Index.renameMultiIndex.rename 等。 (GH4718GH4628 )

  • 在使用中出现错误 iloc/loc 具有横截面和重复索引 (GH4726 )

  • 使用时出现错误 QUOTE_NONE 使用 to_csv 引起 Exception 。 (GH4328 )

  • 当右侧的长度不正确时,序列索引不会引发错误的错误 (GH2702 )

  • 将部分字符串选择作为多索引的一部分进行多索引时出现错误 (GH4758 )

  • Bug with reindexing on the index with a non-unique index will now raise ValueError (GH4746)

  • 设置时出现错误 loc/ix 具有多索引轴和NumPy数组的单个索引器,与 (GH3777 )

  • 在未与轴=0合并的数据类型中存在重复列的串联错误 (GH4771GH4975 )

  • 窃听 iloc 当切片索引失败时 (GH4771 )

  • 不正确的错误消息,没有柱面或宽度 read_fwf 。 (GH4774 )

  • 修复具有重复索引的系列中的索引错误 (GH4548GH4550 )

  • 修复了使用以下命令读取压缩文件的错误 read_fwf 在Python3中。 (GH3963 )

  • 通过更改数据类型修复了重复索引和赋值的问题 (GH4686 )

  • 修复了在AS中读取压缩文件的错误 bytes 而不是 str 在Python3中。简化了在Python3中产生字节的文件处理 (GH3963GH4785 )。

  • 修复了在不同版本的matplotlib中使用对数比例条形图修复了与ticklocs/tickLabels相关的问题 (GH4789 )

  • 与REPR()发出的内部调用相关联的已取消警告 (GH4391 )

  • Fixed an issue with a duplicate index and duplicate selector with .loc (GH4825)

  • Fixed an issue with DataFrame.sort_index where, when sorting by a single column and passing a list for ascending, the argument for ascending was being interpreted as True (GH4839, GH4846)

  • Fixed Panel.tshift not working. Added freq support to Panel.shift (GH4853)

  • 修复带有数千个!=“,”的TextFileReader中的一个问题。 (GH4596 )

  • 使用WHERE时具有重复索引的getitem中出现错误 (GH4879 )

  • FIX类型推断代码将浮点列强制为DATETIME (GH4601 )

  • 固定的 _ensure_numeric 不检查复数 (GH4902 )

  • 修复了中的错误 Series.hist 其中两个人物正在创建时, by 参数已传递 (GH4112GH4113 )。

  • 修复了中的错误 convert_objects 对于>2 ndims (GH4937 )

  • 修复了DataFrame/Panel缓存插入和后续索引中的错误 (GH4939GH5424 )

  • Fixed string methods for FrozenNDArray and FrozenList (GH4929)

  • 修复了在索引放大方案中设置无效或超出范围的值的错误 (GH4940 )

  • 空系列上的FILLNA测试 (GH4346 ),谢谢@imerrr

  • 固定的 copy() 也可以浅复制轴/索引,从而保持单独的元数据。 (GH4202GH4830 )

  • 修复了Read_CSV的Python解析器中的skipprows选项 (GH4382 )

  • 修复了错误预防 cut 从使用 np.inf 不显式传递标签的级别 (GH3415 )

  • Fixed wrong check for overlapping in DatetimeIndex.union (GH4564)

  • 修复了csv_parser中千位分隔符和日期解析器之间的冲突 (GH4678 )

  • 修复数据类型不同时的追加(显示混合浮点数/np.DateTime64时出错) (GH4993 )

  • 修复DateOffset的REPR。不再以kwd显示重复条目。删除了未使用的偏移量字段。 (GH4638 )

  • 修复了使用usecol时READ_CSV期间的错误索引名称。仅适用于c语言解析器。 (GH4201 )

  • Timestamp 对象现在可以出现在比较操作的左侧, SeriesDataFrame 对象 (GH4982 )。

  • Fix a bug when indexing with np.nan via iloc/loc (GH5016)

  • 修复了内存不足的c解析器可能在同一文件的不同块中创建不同类型的错误。现在强制为数字类型或发出警告。 (GH3866 )

  • Fix a bug where reshaping a Series to its own shape raised TypeError (GH4554) and other reshaping issues.

  • 设置时出现错误 ix/loc 和混合整型/字符串索引 (GH4544 )

  • 确保系列-系列布尔比较是基于标签的 (GH4947 )

  • 使用时间戳部分索引器进行多级索引时出现错误 (GH4294 )

  • 测试/修复全NAN框架的多索引构造 (GH4078 )

  • 修复了以下位置的错误 read_html() 未正确推断带有逗号的表的值 (GH5029 )

  • 修复了以下位置的错误 read_html() 没有提供返回表的稳定排序 (GH4770GH5029 )。

  • Fixed a bug where read_html() was incorrectly parsing when passed index_col=0 (GH5066).

  • 修复了以下位置的错误 read_html() 错误地推断标头的类型 (GH5048 )。

  • 修复了以下位置的错误 DatetimeIndexPeriodIndex 导致堆栈溢出 (GH3899 )。

  • 修复了以下位置的错误 groupby 对象不允许打印 (GH5102 )。

  • 修复了以下位置的错误 groupby 对象不是制表符完成的列名 (GH5102 )。

  • 修复了以下位置的错误 groupby.plot() 朋友们多次重复数字 (GH5102 )。

  • 提供自动转换 object 填充NA上的数据类型,相关 (GH5103 )

  • 修复了在选项解析器清理过程中覆盖默认选项的错误 (GH5121 )。

  • 对某一名单[n]一视同仁 iloc 使用类似列表的索引 (GH5006 )

  • 修复 MultiIndex.get_level_values() 带有缺失的值 (GH5074 )

  • 修复了对具有DateTime64输入的Timestamp()的边界检查 (GH4065 )

  • 在以下位置修复错误 TestReadHtml 没有拨打正确的 read_html() 功能 (GH5150 )。

  • 用来修复错误 NDFrame.replace() 这使得替换看起来好像(错误地)使用了正则表达式 (GH5143 )。

  • 修复了TO_DATETIME更好的错误消息 (GH4928 )

  • 确保在Travis-ci上测试不同的区域设置 (GH4918 )。还添加了几个实用程序,用于使用上下文管理器获取区域设置和设置区域设置。

  • 修复了段故障打开 isnull(MultiIndex) (现在改为引发错误) (GH5123GH5125 )

  • 执行对齐操作时允许重复索引 (GH5185GH5639 )

  • Compound dtypes in a constructor raise NotImplementedError (GH5191)

  • 比较重复帧时出现错误 (GH4421 )相关

  • 描述重复帧时出现错误

  • 窃听 to_datetime 具有一种格式和 coerce=True 不是募捐 (GH5195 )

  • 窃听 loc 需要播出的系列剧的多个索引器和RHS设置 (GH5206 )

  • 修复了级别或标签的就地设置错误 MultiIndex 不会清除缓存 values 属性,因此返回错误 values 。 (GH5215 )

  • 修复了过滤分组的DataFrame或系列不保持原始顺序的错误 (GH4621 )。

  • 固定的 Period 业务日期频率,如果是在非业务日期,则始终前滚。 (GH5203 )

  • 修复了Excel编写器中的错误,其中具有重复列名的框架未正确写入。 (GH5235 )

  • 修复了的问题 drop 和级数上的一个非唯一索引 (GH5248 )

  • 修复了C解析器中因传递的名称多于文件中的列而导致的段错误。 (GH5156 )

  • 修复 Series.isin 具有类似日期/时间的数据类型 (GH5021 )

  • C和Python解析器现在可以处理更常见的多索引列格式,该格式没有索引名称行 (GH4702 )

  • 尝试将越界日期用作对象数据类型时出现错误 (GH5312 )

  • 尝试显示嵌入的PandasObject时出现错误 (GH5324 )

  • 如果结果与越界相关,则允许操作时间戳以返回DateTime (GH5312 )

  • Fix return value/type signature of initObjToJSON() to be compatible with numpy's import_array() (GH5334, GH5326)

  • 在DataFrame上重命名然后设置_index时出现错误 (GH5344 )

  • 测试套件在测试图形时不再留下临时文件。 (GH5347 )(感谢您捕捉到这位@yarikopo!)

  • 修复了Win32上的html测试。 (GH4580 )

  • 确保 head/taililoc 基于, (GH5370 )

  • 修复了的错误 PeriodIndex 字符串表示(如果有1个或2个元素)。 (GH5372 )

  • GroupBy方法 transformfilter 可用于具有重复(非唯一)索引的Series和DataFrame。 (GH4620 )

  • 修复了Repr中不打印名称的空系列 (GH4651 )

  • 默认情况下,使测试在临时目录中创建临时文件。 (GH5419 )

  • pd.to_timedelta 返回一个标量。 (GH5410 )

  • pd.to_timedelta 接受 NaNNaT ,返回 NaT 与其提高 (GH5437 )

  • 在以下方面的性能改进 isnull 关于较大尺寸的Pandas物体

  • 修正了与索引器没有匹配长度的1D ndarray的各种类型 (GH5508 )

  • Bug in getitem with a MultiIndex and iloc (GH5528)

  • 系列上的数据项中存在错误 (GH5542 )

  • 使用自定义函数且对象未发生突变时应用中的错误修复 (GH5545 )

  • Bug in selecting from a non-unique index with loc (GH5553)

  • 当用户函数返回一个 None , (GH5592 )

  • Work around regression in numpy 1.7.0 which erroneously raises IndexError from ndarray.item (GH5666)

  • 具有结果非唯一索引的对象的重复索引错误 (GH5678 )

  • 使用系列和通过的系列/词典填充NA中的错误 (GH5703 )

  • 使用类似日期时间的分组程序进行GROUPBY转换时出现错误 (GH5712 )

  • 使用某些键时,PY3中的多索引选择出现错误 (GH5725 )

  • 不同数据类型的行级合并在某些情况下失败 (GH5754 )

贡献者#

共有77人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

  • Agustín Herranz +

  • Alex Gaudio +

  • Alex Rothberg +

  • Andreas Klostermann +

  • Andreas Würl +

  • Andy Hayden

  • Ben Alex +

  • Benedikt Sauer +

  • Brad Buran

  • Caleb Epstein +

  • Chang She

  • Christopher Whelan

  • DSM +

  • Dale Jung +

  • Dan Birken

  • David Rasch +

  • Dieter Vandenbussche

  • Gabi Davar +

  • Garrett Drapala

  • Goyo +

  • Greg Reda +

  • Ivan Smirnov +

  • Jack Kelly +

  • Jacob Schaer +

  • Jan Schulz +

  • Jeff Tratner

  • Jeffrey Tratner

  • John McNamara +

  • John W. O'Brien +

  • Joris Van den Bossche

  • Justin Bozonier +

  • Kelsey Jordahl

  • Kevin Stone

  • Kieran O'Mahony

  • Kyle Hausmann +

  • Kyle Kelley +

  • Kyle Meyer

  • Mike Kelly

  • Mortada Mehyar +

  • Nick Foti +

  • Olivier Harris +

  • Ondřej Čertík +

  • PKEuS

  • Phillip Cloud

  • Pierre Haessig +

  • Richard T. Guy +

  • Roman Pekar +

  • Roy Hyunjin Han

  • Skipper Seabold

  • Sten +

  • Thomas A Caswell +

  • Thomas Kluyver

  • Tiago Requeijo +

  • TomAugspurger

  • Trent Hauck

  • Valentin Haenel +

  • Viktor Kerkez +

  • Vincent Arel-Bundock

  • Wes McKinney

  • Wes Turner +

  • Weston Renoud +

  • Yaroslav Halchenko

  • Zach Dwiel +

  • chapman siu +

  • chappers +

  • d10genes +

  • danielballan

  • daydreamt +

  • engstrom +

  • jreback

  • monicaBee +

  • prossahl +

  • rockg +

  • unutbu +

  • westurner +

  • y-p

  • zach powers