1.0.0中的新特性(2020年1月29日)#

这些是Pandas1.0.0中的变化。看见发行说明获取完整的更改日志，包括其他版本的Pandas。

备注

Pandas 1.0版本删除了许多以前版本中不建议使用的功能(请参见 below 以获得概述)。在升级到Pandas 1.0之前，建议先升级到Pandas 0.25，并确保您的代码在没有警告的情况下工作。

新的弃用策略#

从Pandas1.0.0开始，Pandas将采用 SemVer 到版本发布。简单地说，

在次要版本(如1.1.0、1.2.0、2.1.0等)中将引入弃用功能
主要版本(例如，1.0.0、2.0.0、3.0.0、...)中将强制实施弃用
只有在主要版本中才会进行打破API的更改(试验性功能除外)

看见版本策略想要更多。

增强#

在中使用Numba `rolling.apply` 和 `expanding.apply`#

We've added an engine keyword to apply() and apply() that allows the user to execute the routine using Numba instead of Cython. Using the Numba engine can yield significant performance gains if the apply function can operate on numpy arrays and the data set is larger (1 million rows or greater). For more details, see rolling apply documentation (GH28987, GH30936)

定义滚动操作的自定义窗口#

我们已经添加了一个 pandas.api.indexers.BaseIndexer() 类的新实例，该类允许用户定义在 rolling 运营部。用户可以定义自己的 get_window_bounds 对象上的方法 pandas.api.indexers.BaseIndexer() 子类，它将生成滚动聚合期间用于每个窗口的开始和结束索引。有关更多详细信息和用法示例，请参阅 custom window rolling documentation

转换为降价#

我们已经添加了 to_markdown() 用于创建降价表格 (GH11052 )

In [1]: df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])

In [2]: print(df.to_markdown())
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version)
    138 try:
--> 139     module = importlib.import_module(name)
    140 except ImportError:

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'tabulate'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 print(df.to_markdown())

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/frame.py:2748, in DataFrame.to_markdown(self, buf, mode, index, storage_options, **kwargs)
   2746 kwargs.setdefault("tablefmt", "pipe")
   2747 kwargs.setdefault("showindex", index)
-> 2748 tabulate = import_optional_dependency("tabulate")
   2749 result = tabulate.tabulate(self, **kwargs)
   2750 if buf is None:

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)
    140 except ImportError:
    141     if errors == "raise":
--> 142         raise ImportError(msg)
    143     else:
    144         return None

ImportError: Missing optional dependency 'tabulate'.  Use pip or conda to install tabulate.

试验性新功能#

实验性的 `NA` 标量表示缺少的值#

一个新的 pd.NA 引入值(Singleton)来表示标量缺失值。到目前为止，Pandas使用了几个值来表示缺失的数据： np.nan 用于浮点数据， np.nan 或 None 对于对象数据类型数据和 pd.NaT 用于类似DATETIME的数据。的目标是 pd.NA 就是提供一个可以跨数据类型一致使用的“缺失”指示符。 pd.NA 当前由可为空的整型和布尔数据类型以及新的字符串数据类型使用 (GH28095 )。

警告

实验性的：行为 pd.NA 仍然可以在没有警告的情况下改变。

例如，使用可为空的整数数据类型创建系列：

In [3]: s = pd.Series([1, 2, None], dtype="Int64")

In [4]: s
Out[4]: 
0       1
1       2
2    <NA>
Length: 3, dtype: Int64

In [5]: s[2]
Out[5]: <NA>

与 np.nan ， pd.NA 在某些操作中行为不同。除了算术运算之外， pd.NA 在比较操作中也传播为“MISSING”或“UNKNOW”：

In [6]: np.nan > 1
Out[6]: False

In [7]: pd.NA > 1
Out[7]: <NA>

对于逻辑运算， pd.NA follows the rules of the three-valued logic (或 克莱恩逻辑 )。例如：

In [8]: pd.NA | True
Out[8]: True

有关详细信息，请参阅 NA section 在关于缺失数据的用户指南中。

专用字符串数据类型#

我们已经添加了 StringDtype ，一种专用于字符串数据的扩展类型。以前，字符串通常存储在对象dtype NumPy数组中。 (GH29975 )

警告

StringDtype 目前被认为是试验性的。API的实现和部分内容可能会在没有任何警告的情况下发生更改。

这个 'string' 扩展类型解决了对象dtype NumPy数组的几个问题：

您可能会意外地存储一个 混合物 中的字符串和非字符串的 object 数据类型数组。一个 StringArray 只能存储字符串。
object Dtype中断了特定于Dtype的操作，如 DataFrame.select_dtypes() 。没有一个明确的方法来选择 just 文本，但不包括非文本列，但仍为对象数据类型列。
在读取代码时， object 数据类型数组不太清楚 string 。

In [9]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())
Out[9]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

您可以使用别名 "string" 也是。

In [10]: s = pd.Series(['abc', None, 'def'], dtype="string")

In [11]: s
Out[11]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

通常的字符串访问器方法可以工作。在适当的情况下，DataFrame的系列或列的返回类型也将具有字符串数据类型。

In [12]: s.str.upper()
Out[12]: 
0     ABC
1    <NA>
2     DEF
Length: 3, dtype: string

In [13]: s.str.split('b', expand=True).dtypes
Out[13]: 
0    string
1    string
Length: 2, dtype: object

返回整数的字符串访问器方法将返回值 Int64Dtype

In [14]: s.str.count("a")
Out[14]: 
0       1
1    <NA>
2       0
Length: 3, dtype: Int64

我们建议显式使用 string 使用字符串时的数据类型。看见文本数据类型想要更多。

支持缺少值的布尔数据类型#

我们已经添加了 BooleanDtype / BooleanArray ，这是一种专用于布尔数据的扩展类型，它可以保存缺少的值。默认设置 bool 基于bool-dtype NumPy数组的数据类型，该列只能 True 或 False ，并且不会遗漏值。这个新的 BooleanArray 还可以通过在单独的掩码中跟踪丢失的值来存储缺失值。 (GH29555 ， GH30095 ， GH31131 )

In [15]: pd.Series([True, False, None], dtype=pd.BooleanDtype())
Out[15]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean

您可以使用别名 "boolean" 也是。

In [16]: s = pd.Series([True, False, None], dtype="boolean")

In [17]: s
Out[17]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean

方法 `convert_dtypes` 简化支持的扩展数据类型的使用#

为了鼓励使用扩展数据类型 StringDtype ， BooleanDtype ， Int64Dtype ， Int32Dtype 等等，支持 pd.NA 、方法 DataFrame.convert_dtypes() 和 Series.convert_dtypes() 都被引入了。 (GH29752 ) (GH30929 )

示例：

In [18]: df = pd.DataFrame({'x': ['abc', None, 'def'],
   ....:                    'y': [1, 2, np.nan],
   ....:                    'z': [True, False, True]})
   ....: 

In [19]: df
Out[19]: 
      x    y      z
0   abc  1.0   True
1  None  2.0  False
2   def  NaN   True

[3 rows x 3 columns]

In [20]: df.dtypes
Out[20]: 
x     object
y    float64
z       bool
Length: 3, dtype: object

In [21]: converted = df.convert_dtypes()

In [22]: converted
Out[22]: 
      x     y      z
0   abc     1   True
1  <NA>     2  False
2   def  <NA>   True

[3 rows x 3 columns]

In [23]: converted.dtypes
Out[23]: 
x     string
y      Int64
z    boolean
Length: 3, dtype: object

这在使用读取器读入数据后尤其有用 read_csv() 和 read_excel() 。看见 here 以获取描述。

其他增强功能#

DataFrame.to_string() 添加了 max_colwidth 参数来控制何时截断宽列 (GH9784 )
添加了 na_value 参数为 Series.to_numpy() ， Index.to_numpy() 和 DataFrame.to_numpy() 控制用于丢失数据的值的步骤 (GH30322 )
MultiIndex.from_product() 如果未明确提供，则根据输入推断级别名称 (GH27292 )
DataFrame.to_latex() 现在接受 caption 和 label 论据 (GH25436 )
数据帧，带 nullable integer ，即 new string dtype 和Period数据类型现在可以转换为 pyarrow (>=0.15.0)，这意味着在使用 pyarrow 发动机 (GH28368 )。到镶木地板的完整往返行程(与 to_parquet() / read_parquet() )支持，以yarrow>=0.16开头 (GH20612 )。
to_parquet() 现在可以适当地处理 schema 派生引擎中用户定义的架构的参数。 (GH30270 )
DataFrame.to_json() 现在接受 indent 整型参数以实现JSON输出的漂亮打印 (GH12004 )
read_stata() 可以读取Stata 119 DTA文件。 (GH28250 )
实施 pandas.core.window.Window.var() 和 pandas.core.window.Window.std() 功能 (GH26597 )
已添加 encoding 参数为 DataFrame.to_string() 对于非ASCII文本 (GH28766 )
已添加 encoding 参数为 DataFrame.to_html() 对于非ASCII文本 (GH28663 )
Styler.background_gradient() 现在接受 vmin 和 vmax 论据 (GH12145 )
Styler.format() 添加了 na_rep 参数来帮助设置缺少值的格式 (GH21527 ， GH28358 )
read_excel() 现在可以读取二进制Excel (.xlsb )文件通过传递 engine='pyxlsb' 。有关更多详细信息和用法示例，请参阅 Binary Excel files documentation 。关门大吉 GH8540 。
这个 partition_cols 中的参数 DataFrame.to_parquet() 现在接受字符串 (GH27117 )
pandas.read_json() now parses NaN, Infinity and -Infinity (GH12213)
DataFrame constructor preserve ExtensionArray dtype with ExtensionArray (GH11363)
DataFrame.sort_values() 和 Series.sort_values() 已经获得了 ignore_index 关键字能够在排序后重置索引 (GH30114 )
DataFrame.sort_index() 和 Series.sort_index() 已经获得了 ignore_index 用于重置索引的关键字 (GH30114 )
DataFrame.drop_duplicates() 已经获得了 ignore_index 用于重置索引的关键字 (GH30114 )
添加了用于在118和119版中导出Stata DTA文件的新编写器。 StataWriterUTF8 。这些文件格式支持导出包含Unicode字符的字符串。格式119支持超过32,767个变量的数据集 (GH23573 ， GH30959 )
Series.map() 现在接受 collections.abc.Mapping 作为映射器的子类 (GH29733 )
添加了一个实验性的 attrs 用于存储有关数据集的全局元数据 (GH29062 )
Timestamp.fromisocalendar() 现在与python3.8及更高版本兼容 (GH28115 )
DataFrame.to_pickle() 和 read_pickle() 现在接受URL (GH30163 )

向后不兼容的API更改#

避免使用来自 `MultiIndex.levels`#

作为更大规模重构的一部分 MultiIndex 现在，级别名称与级别分开存储 (GH27242 )。我们建议您使用 MultiIndex.names 来访问这些名称，以及 Index.set_names() 来更新这些名字。

为了向后兼容，您仍然可以 访问权限 通过层级的名字。

In [24]: mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])

In [25]: mi.levels[0].name
Out[25]: 'x'

然而，再也不可能更新这些人的名字 MultiIndex 通过这一层。

In [26]: mi.levels[0].name = "new name"
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [26], in <cell line: 1>()
----> 1 mi.levels[0].name = "new name"

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/indexes/base.py:1734, in Index.name(self, value)
   1730 @name.setter
   1731 def name(self, value: Hashable):
   1732     if self._no_setting_name:
   1733         # Used in MultiIndex.levels to avoid silently ignoring name updates.
-> 1734         raise RuntimeError(
   1735             "Cannot set name on a level of a MultiIndex. Use "
   1736             "'MultiIndex.set_names' instead."
   1737         )
   1738     maybe_extract_name(value, None, type(self))
   1739     self._name = value

RuntimeError: Cannot set name on a level of a MultiIndex. Use 'MultiIndex.set_names' instead.

In [27]: mi.names
Out[27]: FrozenList(['x', 'y'])

要更新，请使用 MultiIndex.set_names ，它返回一个新的 MultiIndex 。

In [28]: mi2 = mi.set_names("new name", level=0)

In [29]: mi2.names
Out[29]: FrozenList(['new name', 'y'])

新的版本 `IntervalArray`#

pandas.arrays.IntervalArray 采用新的 __repr__ 根据其他数组类 (GH25022 )

Pandas0.25.x

In [1]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[2]:
IntervalArray([(0, 1], (2, 3]],
              closed='right',
              dtype='interval[int64]')

Pandas1.0.0

In [30]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[30]: 
<IntervalArray>
[(0, 1], (2, 3]]
Length: 2, dtype: interval[int64, right]

`DataFrame.rename` 现在只接受一个位置参数#

DataFrame.rename() 以前会接受会导致模棱两可或未定义的行为的位置参数。从Pandas 1.0开始，只允许按位置传递第一个参数，该参数沿默认轴将标注映射到它们的新名称 (GH29136 )。

Pandas0.25.x

In [1]: df = pd.DataFrame([[1]])
In [2]: df.rename({0: 1}, {0: 2})
Out[2]:
FutureWarning: ...Use named arguments to resolve ambiguity...
   2
1  1

Pandas1.0.0

In [3]: df.rename({0: 1}, {0: 2})
Traceback (most recent call last):
...
TypeError: rename() takes from 1 to 2 positional arguments but 3 were given

请注意，当提供冲突或可能有歧义的参数时，现在将引发错误。

Pandas0.25.x

In [4]: df.rename({0: 1}, index={0: 2})
Out[4]:
   0
1  1

In [5]: df.rename(mapper={0: 1}, index={0: 2})
Out[5]:
   0
2  1

Pandas1.0.0

In [6]: df.rename({0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

In [7]: df.rename(mapper={0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'

方法，您仍然可以更改应用第一个位置参数的轴 axis 关键字参数。

In [31]: df.rename({0: 1})
Out[31]: 
   0
1  1

[1 rows x 1 columns]

In [32]: df.rename({0: 1}, axis=1)
Out[32]: 
   1
0  1

[1 rows x 1 columns]

如果要同时更新索引标签和列标签，请确保使用各自的关键字。

In [33]: df.rename(index={0: 1}, columns={0: 2})
Out[33]: 
   2
1  1

[1 rows x 1 columns]

扩展的详细信息输出 `DataFrame`#

DataFrame.info() 现在显示列摘要的行号 (GH17304 )

Pandas0.25.x

In [1]: df = pd.DataFrame({"int_col": [1, 2, 3],
...                    "text_col": ["a", "b", "c"],
...                    "float_col": [0.0, 0.1, 0.2]})
In [2]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
int_col      3 non-null int64
text_col     3 non-null object
float_col    3 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 152.0+ bytes

Pandas1.0.0

In [34]: df = pd.DataFrame({"int_col": [1, 2, 3],
   ....:                    "text_col": ["a", "b", "c"],
   ....:                    "float_col": [0.0, 0.1, 0.2]})
   ....: 

In [35]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   int_col    3 non-null      int64  
 1   text_col   3 non-null      object 
 2   float_col  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

`pandas.array()` 推理变化#

pandas.array() 现在在几个案例中推断Pandas的新扩展类型 (GH29791 )：

字符串数据(包括缺少的值)现在返回一个 arrays.StringArray 。
整型数据(包括缺失值)现在返回一个 arrays.IntegerArray 。
布尔数据(包括缺失的值)现在返回新的 arrays.BooleanArray

Pandas0.25.x

In [1]: pd.array(["a", None])
Out[1]:
<PandasArray>
['a', None]
Length: 2, dtype: object

In [2]: pd.array([1, None])
Out[2]:
<PandasArray>
[1, None]
Length: 2, dtype: object

Pandas1.0.0

In [36]: pd.array(["a", None])
Out[36]: 
<StringArray>
['a', <NA>]
Length: 2, dtype: string

In [37]: pd.array([1, None])
Out[37]: 
<IntegerArray>
[1, <NA>]
Length: 2, dtype: Int64

提醒一下，您可以指定 dtype 禁用所有推论。

`arrays.IntegerArray` now uses `pandas.NA`#

arrays.IntegerArray 现在使用 pandas.NA 而不是 numpy.nan 作为其缺失的值标记 (GH29964 )。

Pandas0.25.x

In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

In [3]: a[2]
Out[3]:
nan

Pandas1.0.0

In [38]: a = pd.array([1, 2, None], dtype="Int64")

In [39]: a
Out[39]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

In [40]: a[2]
Out[40]: <NA>

这产生了一些破坏API的后果。

转换为NumPy ndarray

转换为NumPy数组时，缺少的值将为 pd.NA ，它不能转换为浮点数。所以在呼唤 np.asarray(integer_array, dtype="float") 现在请举手。

Pandas0.25.x

In [1]: np.asarray(a, dtype="float")
Out[1]:
array([ 1.,  2., nan])

Pandas1.0.0

In [41]: np.asarray(a, dtype="float")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [41], in <cell line: 1>()
----> 1 np.asarray(a, dtype="float")

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/arrays/masked.py:489, in BaseMaskedArray.__array__(self, dtype)
    484 def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
    485     """
    486     the array interface, return my values
    487     We return an object array here to preserve our scalar values
    488     """
--> 489     return self.to_numpy(dtype=dtype)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/arrays/masked.py:411, in BaseMaskedArray.to_numpy(self, dtype, copy, na_value)
    405 if self._hasna:
    406     if (
    407         not is_object_dtype(dtype)
    408         and not is_string_dtype(dtype)
    409         and na_value is libmissing.NA
    410     ):
--> 411         raise ValueError(
    412             f"cannot convert to '{dtype}'-dtype NumPy array "
    413             "with missing values. Specify an appropriate 'na_value' "
    414             "for this dtype."
    415         )
    416     # don't pass copy to astype -> always need a copy since we are mutating
    417     data = self._data.astype(dtype)

ValueError: cannot convert to 'float64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.

使用 arrays.IntegerArray.to_numpy() 使用显式 na_value 取而代之的是。

In [42]: a.to_numpy(dtype="float", na_value=np.nan)
Out[42]: array([ 1.,  2., nan])

减价可以返还 pd.NA

在执行减法运算时，如使用 skipna=False ，结果将是 pd.NA 而不是 np.nan 在存在缺少值的情况下 (GH30958 )。

Pandas0.25.x

In [1]: pd.Series(a).sum(skipna=False)
Out[1]:
nan

Pandas1.0.0

In [43]: pd.Series(a).sum(skipna=False)
Out[43]: <NA>

value_counts returns a nullable integer dtype

Series.value_counts() 对于可为空的整数，dtype现在返回值的可为空的整型数据类型。

Pandas0.25.x

In [1]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[1]:
dtype('int64')

Pandas1.0.0

In [44]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[44]: Int64Dtype()

看见实验性的 NA 标量表示缺少的值有关两者之间差异的更多信息 pandas.NA 和 numpy.nan 。

`arrays.IntegerArray` comparisons return `arrays.BooleanArray`#

上的比较运算。 arrays.IntegerArray 现在返回一个 arrays.BooleanArray 而不是NumPy数组 (GH29964 )。

Pandas0.25.x

In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

In [3]: a > 1
Out[3]:
array([False,  True, False])

Pandas1.0.0

In [45]: a = pd.array([1, 2, None], dtype="Int64")

In [46]: a > 1
Out[46]: 
<BooleanArray>
[False, True, <NA>]
Length: 3, dtype: boolean

请注意，缺失的值现在会传播，而不是总是比较不相等的LIKE numpy.nan 。看见实验性的 NA 标量表示缺少的值想要更多。

默认情况下， `Categorical.min()` 现在返回最小值，而不是np.nan#

什么时候 Categorical 包含 np.nan ， Categorical.min() 不再返回 np.nan 默认情况下(skipna=True) (GH25303 )

Pandas0.25.x

In [1]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[1]: nan

Pandas1.0.0

In [47]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[47]: 1

默认数据类型为空 `pandas.Series`#

正在初始化空的 pandas.Series 如果不指定数据类型，将引发 DeprecationWarning 现在 (GH17261 )。默认数据类型将从 float64 至 object 在将来的版本中，以便与 DataFrame 和 Index 。

Pandas1.0.0

In [1]: pd.Series()
Out[2]:
DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
Series([], dtype: float64)

重采样操作的结果数据类型推断更改#

中结果数据类型的规则 DataFrame.resample() 扩展类型的聚合已更改 (GH31359 )。以前，Pandas会尝试将结果转换回原始的dtype，如果这是不可能的，就会退回到通常的推理规则。现在，如果结果中的标量值是扩展dtype的标量类型的实例，则Pandas将只返回原始dtype的结果。

In [48]: df = pd.DataFrame({"A": ['a', 'b']}, dtype='category',
   ....:                   index=pd.date_range('2000', periods=2))
   ....: 

In [49]: df
Out[49]: 
            A
2000-01-01  a
2000-01-02  b

[2 rows x 1 columns]

Pandas0.25.x

In [1]> df.resample("2D").agg(lambda x: 'a').A.dtype
Out[1]:
CategoricalDtype(categories=['a', 'b'], ordered=False)

Pandas1.0.0

In [50]: df.resample("2D").agg(lambda x: 'a').A.dtype
Out[50]: dtype('O')

这修复了 resample 和 groupby 。这还修复了一个潜在的错误，其中值根据结果转换回原始数据类型的方式，结果的值可能会发生变化。

Pandas0.25.x

In [1] df.resample("2D").agg(lambda x: 'c')
Out[1]:

     A
0  NaN

Pandas1.0.0

In [51]: df.resample("2D").agg(lambda x: 'c')
Out[51]: 
            A
2000-01-01  c

[1 rows x 1 columns]

提高了Python的最低版本#

Pandas 1.0.0支持Python3.6.1及更高版本 (GH29212 )。

提高了依赖项的最低版本#

更新了一些受支持的依赖项最低版本 (GH29766 ， GH29723 )。如果已安装，我们现在需要：

套餐	最低版本	必填项
钱币	1.13.3	X
皮兹	2015.4	X
Python-Dateutil	2.6.1	X
瓶颈	1.2.1
数字快递	2.6.2
最热(Dev)	4.0.2

为 optional libraries 一般建议使用最新版本。下表列出了目前在整个Pandas发育过程中正在测试的每个库的最低版本。低于最低测试版本的可选库仍可运行，但不被视为受支持。

套餐	最低版本	变化
美味可口的汤	4.6.0
实木地板	0.3.2	X
Gcsf	0.2.2
Lxml	3.8.0
Matplotlib	2.2.2
Numba	0.46.0	X
OpenPyxl	2.5.7	X
绿箭侠	0.13.0	X
Pymysql	0.7.1
易燃物	3.4.2
S3FS	0.3.0	X
斯比	0.19.0
SQLALCHIZY	1.1.4
XARRAY	0.8.2
Xlrd	1.1.0
Xlsx写入器	0.9.8
超大重量	1.2.0

看见依赖项和可选依赖项想要更多。

构建更改#

大Pandas又增加了一个 pyproject.toml 文件，并且将不再包括上传到PyPI的源代码发行版中的Cython化文件 (GH28341 ， GH20775 )。如果您正在安装一个已构建的发行版(轮子)或通过Conda，这应该不会对您产生任何影响。如果从源代码构建Pandas，则在调用之前不再需要在构建环境中安装Cython pip install pandas 。

其他API更改#

core.groupby.GroupBy.transform 现在在无效的操作名称上引发 (GH27489 )
pandas.api.types.infer_dtype() 现在将为整数返回“INTEGER-NA np.nan 混料 (GH27283 )
MultiIndex.from_arrays() 将不再从数组中推断名称，如果 names=None 是明确提供的 (GH27292 )
In order to improve tab-completion, pandas does not include most deprecated attributes when introspecting a pandas object using dir (e.g. dir(df)). To see which attributes are excluded, see an object's _deprecations attribute, for example pd.DataFrame._deprecations (GH28805).
的返回数据类型 unique() 现在匹配输入数据类型。 (GH27874 )
Changed the default configuration value for options.matplotlib.register_converters from True to "auto" (GH18720). Now, pandas custom formatters will only be applied to plots created by pandas, through plot(). Previously, pandas' formatters would be applied to all plots created after a plot(). See units registration for more.
Series.dropna() 已经放弃了它的 **kwargs 支持单打的论据 how 参数。供应任何其他东西，除了 how 至 **kwargs 提出了一项 TypeError 先前 (GH29388 )
在测试Pandas时，新的最低要求版本是5.0.1 (GH29664 )
Series.str.__iter__() 已弃用，并将在未来版本中删除 (GH28277 )。
Added <NA> to the list of default NA values for read_csv() (GH30821)

文档改进#

Added new section on 扩展到大型数据集 (GH28315).
在上添加了小节查询多索引对于HDF5数据集 (GH28791 )。

不推荐使用#

Series.item() 和 Index.item() 已 _undeprecated_ (GH29250 )
Index.set_value 已被弃用。对于给定的索引 idx ，数组 arr ，价值在 idx 的 idx_val 和一个新的价值 val ， idx.set_value(arr, idx_val, val) 相当于 arr[idx.get_loc(idx_val)] = val ，而应改用 (GH28621 )。
is_extension_type() 已被弃用， is_extension_array_dtype() 应改为使用 (GH29457 )
eval() 关键字参数“truediv”已弃用，并将在未来版本中删除 (GH29812 )
DateOffset.isAnchored() 和 DatetOffset.onOffset() 已弃用，并将在将来的版本中删除，请使用 DateOffset.is_anchored() 和 DateOffset.is_on_offset() 取而代之的是 (GH30340 )
pandas.tseries.frequencies.get_offset 已弃用，并将在未来版本中删除，请使用 pandas.tseries.frequencies.to_offset 取而代之的是 (GH4205 )
Categorical.take_nd() 和 CategoricalIndex.take_nd() 已弃用，请使用 Categorical.take() 和 CategoricalIndex.take() 取而代之的是 (GH27745 )
The parameter numeric_only of Categorical.min() and Categorical.max() is deprecated and replaced with skipna (GH25303)
该参数 label 在……里面 lreshape() 已被弃用，并将在未来版本中删除 (GH29742 )
pandas.core.index 已被弃用，并将在未来版本中删除，公共类在顶级命名空间中可用 (GH19711 )
pandas.json_normalize() 现在在顶级命名空间中公开。用法： json_normalize 作为 pandas.io.json.json_normalize 现在已弃用，建议使用 json_normalize 作为 pandas.json_normalize() 取而代之的是 (GH27586 )。
这个 numpy 论证 pandas.read_json() 已弃用 (GH28512 )。
DataFrame.to_stata() ， DataFrame.to_feather() ，以及 DataFrame.to_parquet() 参数“fname”已弃用，请改用“Path” (GH23574 )
The deprecated internal attributes _start, _stop and _step of RangeIndex now raise a FutureWarning instead of a DeprecationWarning (GH26581)
The pandas.util.testing module has been deprecated. Use the public API in pandas.testing documented at 断言函数 (GH16232).
pandas.SparseArray has been deprecated. Use pandas.arrays.SparseArray (arrays.SparseArray) instead. (GH30642)
该参数 is_copy 的 Series.take() 和 DataFrame.take() 已被弃用，并将在未来版本中删除。 (GH27357 )
支持多维索引(例如 index[:, None] )上 Index 已弃用，并将在将来的版本中删除，请在索引前转换为NumPy数组 (GH30588 )
这个 pandas.np 子模块现在已弃用。直接导入NumPy (GH30296 )
这个 pandas.datetime 类现在已弃用。导入自 datetime 取而代之的是 (GH30610 )
diff 将引发一个 TypeError 而不是在将来隐式地丢失扩展类型的数据类型。在调用之前转换为正确的数据类型 diff 取而代之的是 (GH31025 )

从分组的DataFrame中选择列

从中选择列时 DataFrameGroupBy 对象中传递单个键(或键的元组)是不推荐使用的，应改用项列表。 (GH23566 )例如：

df = pd.DataFrame({
    "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
    "B": np.random.randn(8),
    "C": np.random.randn(8),
})
g = df.groupby('A')

# single key, returns SeriesGroupBy
g['B']

# tuple of single key, returns SeriesGroupBy
g[('B',)]

# tuple of multiple keys, returns DataFrameGroupBy, raises FutureWarning
g[('B', 'C')]

# multiple keys passed directly, returns DataFrameGroupBy, raises FutureWarning
# (implicitly converts the passed strings into a single tuple)
g['B', 'C']

# proper way, returns DataFrameGroupBy
g[['B', 'C']]

删除先前版本的弃用/更改#

删除了SparseSeries和SparseDataFrame

SparseSeries ， SparseDataFrame 以及 DataFrame.to_sparse 方法已被移除 (GH28425 )。我们建议您使用 Series 或 DataFrame 而是使用稀疏值。看见正在迁移获取有关迁移现有代码的帮助。

Matplotlib单位注册

此前，大Pandas会在matplotlib上注册转换器，这是进口大Pandas的副作用 (GH18720 )。这改变了导入Pandas后通过matplotlib地块生成的地块的输出，即使您直接使用matplotlib而不是 plot() 。

要在matplotlib图中使用PANAS格式化程序，请指定

In [1]: import pandas as pd
In [2]: pd.options.plotting.matplotlib.register_converters = True

Note that plots created by DataFrame.plot() and Series.plot() do register the converters automatically. The only behavior change is when plotting a date-like object via matplotlib.pyplot.plot or matplotlib.Axes.plot. See 时间序列图的自定义格式化程序 for more.

其他删除

删除了以前不推荐使用的关键字“index” read_stata() ， StataReader ，以及 StataReader.read() ，改用“INDEX_COL” (GH17328 )
已删除 StataReader.data 方法，请使用 StataReader.read() 取而代之的是 (GH9493 )
已删除 pandas.plotting._matplotlib.tsplot ，使用 Series.plot() 取而代之的是 (GH19980 )
pandas.tseries.converter.register has been moved to pandas.plotting.register_matplotlib_converters() (GH18307)
Series.plot() 不再接受位置参数，改为传递关键字参数 (GH30003 )
DataFrame.hist() 和 Series.hist() 不再允许 figsize="default" ，通过传递元组来指定图形大小 (GH30003 )
Floordiv of integer-dtyped array by Timedelta now raises TypeError (GH21036)
TimedeltaIndex 和 DatetimeIndex 不再接受像“timedelta64”或“date time64”这样的非纳秒数据类型字符串，请使用“timedelta64 [ns] “和”DateTime64 [ns] “相反， (GH24806 )
Changed the default "skipna" argument in pandas.api.types.infer_dtype() from False to True (GH24050)
Removed Series.ix and DataFrame.ix (GH26438)
Removed Index.summary (GH18217)
方法中删除了以前不推荐使用的关键字“FastPath”。 Index 构造函数 (GH23110 )
Removed Series.get_value, Series.set_value, DataFrame.get_value, DataFrame.set_value (GH17739)
Removed Series.compound and DataFrame.compound (GH26405)
Changed the default "inplace" argument in DataFrame.set_index() and Series.set_axis() from None to False (GH27600)
Removed Series.cat.categorical, Series.cat.index, Series.cat.name (GH24751)
Removed the previously deprecated keyword "box" from to_datetime() and to_timedelta(); in addition these now always returns DatetimeIndex, TimedeltaIndex, Index, Series, or DataFrame (GH24486)
to_timedelta() ， Timedelta ，以及 TimedeltaIndex 不再允许“单位”参数使用“M”、“y”或“Y” (GH23264 )
Removed the previously deprecated keyword "time_rule" from (non-public) offsets.generate_range, which has been moved to core.arrays._ranges.generate_range() (GH24157)
DataFrame.loc() 或 Series.loc() 具有类似列表的索引器和缺少的标签将不再重新索引 (GH17295 )
DataFrame.to_excel() 和 Series.to_excel() 不存在的列将不再重新编制索引 (GH17295 )
从中删除了以前不推荐使用的关键字“Join_axes concat() ；使用 reindex_like 取而代之的是 (GH22318 )
从中删除了以前不推荐使用的关键字“by” DataFrame.sort_index() ，使用 DataFrame.sort_values() 取而代之的是 (GH10726 )
Removed support for nested renaming in DataFrame.aggregate(), Series.aggregate(), core.groupby.DataFrameGroupBy.aggregate(), core.groupby.SeriesGroupBy.aggregate(), core.window.rolling.Rolling.aggregate() (GH18529)
Passing datetime64 data to TimedeltaIndex or timedelta64 data to DatetimeIndex now raises TypeError (GH23539, GH23937)
通过 int64 值为 DatetimeIndex 现在，时区将这些值解释为UTC中的纳秒时间戳，而不是给定时区中的挂壁时间 (GH24559 )
传递给的元组 DataFrame.groupby() 现在被独占地视为单个密钥。 (GH18314 )
已删除 Index.contains ，使用 key in index 取而代之的是 (GH30103 )
Addition and subtraction of int or integer-arrays is no longer allowed in Timestamp, DatetimeIndex, TimedeltaIndex, use obj + n * obj.freq instead of obj + n (GH22535)
Removed Series.ptp (GH21614)
Removed Series.from_array (GH18258)
Removed DataFrame.from_items (GH18458)
Removed DataFrame.as_matrix, Series.as_matrix (GH18458)
Removed Series.asobject (GH18477)
Removed DataFrame.as_blocks, Series.as_blocks, DataFrame.blocks, Series.blocks (GH17656)
pandas.Series.str.cat() now defaults to aligning others, using join='left' (GH27611)
pandas.Series.str.cat() 不接受列表点赞在榜单-不再点赞 (GH27611 )
Series.where() 使用 Categorical 数据类型(或 DataFrame.where() 使用 Categorical 列)不再允许设置新类别 (GH24114 )
属性中删除以前不推荐使用的关键字“Start”、“End”和“Period”。 DatetimeIndex ， TimedeltaIndex ，以及 PeriodIndex 构造函数；使用 date_range() ， timedelta_range() ，以及 period_range() 取而代之的是 (GH23919 )
属性中删除了以前不推荐使用的关键字“VERIFY_INTEGRITY”。 DatetimeIndex 和 TimedeltaIndex 构造函数 (GH23919 )
Removed the previously deprecated keyword "fastpath" from pandas.core.internals.blocks.make_block (GH19265)
Removed the previously deprecated keyword "dtype" from Block.make_block_same_class() (GH19434)
已删除 ExtensionArray._formatting_values 。使用 ExtensionArray._formatter 取而代之的是。 (GH23601 )
Removed MultiIndex.to_hierarchical (GH21613)
已删除 MultiIndex.labels ，使用 MultiIndex.codes 取而代之的是 (GH23752 )
方法中删除了以前不推荐使用的关键字“Labels”。 MultiIndex 构造函数，请改用“code”。 (GH23752 )
已删除 MultiIndex.set_labels ，使用 MultiIndex.set_codes() 取而代之的是 (GH23752 )
将以前不推荐使用的关键字“Labels”从 MultiIndex.set_codes() ， MultiIndex.copy() ， MultiIndex.drop() ，改用“代码”。 (GH23752 )
删除了对传统HDF5格式的支持 (GH29787 )
传递dtype别名(例如‘DateTime64 [NS，UTC] ‘)至 DatetimeTZDtype 不再允许，请使用 DatetimeTZDtype.construct_from_string() 取而代之的是 (GH23990 )
将以前不推荐使用的关键字“SKIP_FOOTER”从 read_excel() ；改用“skipfooter” (GH18836 )
read_excel() 不再允许将整数值作为参数 usecols ，而是传递一个从0到 usecols 包括在内 (GH23635 )
Removed the previously deprecated keyword "convert_datetime64" from DataFrame.to_records() (GH18902)
已删除 IntervalIndex.from_intervals 支持该法案 IntervalIndex 构造函数 (GH19263 )
Changed the default "keep_tz" argument in DatetimeIndex.to_series() from None to True (GH23739)
Removed api.types.is_period and api.types.is_datetimetz (GH23917)
能够阅读包含以下内容的泡菜 Categorical 使用0.16之前版本的Pandas创建的实例已被删除 (GH27538 )
Removed pandas.tseries.plotting.tsplot (GH18627)
Removed the previously deprecated keywords "reduce" and "broadcast" from DataFrame.apply() (GH18577)
Removed the previously deprecated assert_raises_regex function in pandas._testing (GH29174)
Removed the previously deprecated FrozenNDArray class in pandas.core.indexes.frozen (GH29335)
从中删除了以前不推荐使用的关键字“ntus” read_feather() ，请改用“USE_THREADS” (GH23053 )
Removed Index.is_lexsorted_for_tuple (GH29305)
Removed support for nested renaming in DataFrame.aggregate(), Series.aggregate(), core.groupby.DataFrameGroupBy.aggregate(), core.groupby.SeriesGroupBy.aggregate(), core.window.rolling.Rolling.aggregate() (GH29608)
已删除 Series.valid ；使用 Series.dropna() 取而代之的是 (GH18800 )
Removed DataFrame.is_copy, Series.is_copy (GH18812)
Removed DataFrame.get_ftype_counts, Series.get_ftype_counts (GH18243)
Removed DataFrame.ftypes, Series.ftypes, Series.ftype (GH26744)
已删除 Index.get_duplicates ，使用 idx[idx.duplicated()].unique() 取而代之的是 (GH20239 )
Removed Series.clip_upper, Series.clip_lower, DataFrame.clip_upper, DataFrame.clip_lower (GH24203)
Removed the ability to alter DatetimeIndex.freq, TimedeltaIndex.freq, or PeriodIndex.freq (GH20772)
Removed DatetimeIndex.offset (GH20730)
已删除 DatetimeIndex.asobject ， TimedeltaIndex.asobject ， PeriodIndex.asobject ，使用 astype(object) 取而代之的是 (GH29801 )
Removed the previously deprecated keyword "order" from factorize() (GH19751)
Removed the previously deprecated keyword "encoding" from read_stata() and DataFrame.to_stata() (GH21400)
Changed the default "sort" argument in concat() from None to False (GH20613)
从中删除了以前不推荐使用的关键字“RAISE_CONFIRECT” DataFrame.update() ，改用“错误”。 (GH23585 )
从中删除以前不推荐使用的关键字“n” DatetimeIndex.shift() ， TimedeltaIndex.shift() ， PeriodIndex.shift() ，改用“句点”。 (GH22458 )
Removed the previously deprecated keywords "how", "fill_method", and "limit" from DataFrame.resample() (GH30139)
Passing an integer to Series.fillna() or DataFrame.fillna() with timedelta64[ns] dtype now raises TypeError (GH24694)
将多个轴传递到 DataFrame.dropna() 不再受支持 (GH20995 )
已删除 Series.nonzero ，使用 to_numpy().nonzero() 取而代之的是 (GH24048 )
传递浮点数据类型 codes 至 Categorical.from_codes() 不再受支持，请传递 codes.astype(np.int64) 取而代之的是 (GH21775 )
将以前不推荐使用的关键字“pat”从 Series.str.partition() 和 Series.str.rpartition() ，改用“sep”。 (GH23767 )
Removed Series.put (GH27106)
Removed Series.real, Series.imag (GH27106)
Removed Series.to_dense, DataFrame.to_dense (GH26684)
已删除 Index.dtype_str ，使用 str(index.dtype) 取而代之的是 (GH27106 )
Categorical.ravel() returns a Categorical instead of a ndarray (GH27199)
The 'outer' method on Numpy ufuncs, e.g. np.subtract.outer operating on Series objects is no longer supported, and will raise NotImplementedError (GH27198)
Removed Series.get_dtype_counts and DataFrame.get_dtype_counts (GH27145)
Changed the default "fill_value" argument in Categorical.take() from True to False (GH20841)
Changed the default value for the raw argument in Series.rolling().apply(), DataFrame.rolling().apply(), Series.expanding().apply(), and DataFrame.expanding().apply() from None to False (GH20584)
删除了不推荐使用的行为 Series.argmin() 和 Series.argmax() ，使用 Series.idxmin() 和 Series.idxmax() 对于过去的行为 (GH16955 )
Passing a tz-aware datetime.datetime or Timestamp into the Timestamp constructor with the tz argument now raises a ValueError (GH23621)
Removed Series.base, Index.base, Categorical.base, Series.flags, Index.flags, PeriodArray.flags, Series.strides, Index.strides, Series.itemsize, Index.itemsize, Series.data, Index.data (GH20721)
Changed Timedelta.resolution() to match the behavior of the standard library datetime.timedelta.resolution, for the old behavior, use Timedelta.resolution_string() (GH26839)
Removed Timestamp.weekday_name, DatetimeIndex.weekday_name, and Series.dt.weekday_name (GH18164)
Removed the previously deprecated keyword "errors" in Timestamp.tz_localize(), DatetimeIndex.tz_localize(), and Series.tz_localize() (GH22644)
Changed the default "ordered" argument in CategoricalDtype from None to False (GH26336)
Series.set_axis() 和 DataFrame.set_axis() 现在需要“Labels”作为第一个参数，“Axis”作为可选的命名参数 (GH30089 )
Removed to_msgpack, read_msgpack, DataFrame.to_msgpack, Series.to_msgpack (GH27103)
Removed Series.compress (GH21930)
从中删除了以前不推荐使用的关键字“Fill_Value” Categorical.fillna() ，改用“Value” (GH19269 )
删除了以前不推荐使用的关键字“data” andrews_curves() ，改用“Frame” (GH6956 )
删除了以前不推荐使用的关键字“data” parallel_coordinates() ，改用“Frame” (GH6956 )
从中删除了以前不推荐使用的关键字“Colors parallel_coordinates() ，改用“颜色”。 (GH6956 )
Removed the previously deprecated keywords "verbose" and "private_key" from read_gbq() (GH30200)
Calling np.array and np.asarray on tz-aware Series and DatetimeIndex will now return an object array of tz-aware Timestamp (GH24596)

性能改进#

性能提升 DataFrame 带标量的算术和比较运算 (GH24990 ， GH29853 )
Performance improvement in indexing with a non-unique IntervalIndex (GH27489)
Performance improvement in MultiIndex.is_monotonic (GH27495)
Performance improvement in cut() when bins is an IntervalIndex (GH27668)
Performance improvement when initializing a DataFrame using a range (GH30171)
Performance improvement in DataFrame.corr() when method is "spearman" (GH28139)
性能提升 DataFrame.replace() 当提供要替换的值列表时 (GH28099 )
性能提升 DataFrame.select_dtypes() 通过使用矢量化而不是循环迭代 (GH28317 )
Performance improvement in Categorical.searchsorted() and CategoricalIndex.searchsorted() (GH28795)
比较以下各项时的性能提升 Categorical 使用标量，并且在类别中找不到标量 (GH29750 )
中的值检查时的性能改进。 Categorical 等于、等于、大于或大于给定标量。如果检查是否存在 Categorical 小于或小于或等于标量 (GH29820 )
Performance improvement in Index.equals() and MultiIndex.equals() (GH29134)
Performance improvement in infer_dtype() when skipna is True (GH28814)

错误修复#

直截了当的#

添加了测试以断言 fillna() 引发正确的 ValueError 值不是类别中的值时的消息 (GH13628 )
窃听 Categorical.astype() 哪里 NaN 转换为int时错误地处理了值 (GH28406 )
DataFrame.reindex() 使用一个 CategoricalIndex 当目标包含重复项时会失败，而如果源包含重复项则不会失败 (GH28107 )
窃听 Categorical.astype() 不允许强制转换为扩展数据类型 (GH28668 )
BUG在哪里 merge() 无法联接分类和扩展数据类型列 (GH28668 )
Categorical.searchsorted() 和 CategoricalIndex.searchsorted() 现在还在研究无序范畴词 (GH21667 )
添加了用于断言镶木地板往返的测试 DataFrame.to_parquet() 或 read_parquet() 将为字符串类型保留分类数据类型 (GH27955 )
更改了中的错误消息 Categorical.remove_categories() 始终将无效删除显示为一组 (GH28669 )
在分类dtype上使用日期访问器 Series 没有返回相同类型的对象，就好像使用 str.() / dt.() 在一个 Series 是那种类型的。例如，当访问 Series.dt.tz_localize() 在一个 Categorical 对于重复条目，访问者正在跳过重复项 (GH27952 )
窃听 DataFrame.replace() 和 Series.replace() 这将给出不正确的分类数据结果 (GH26988 )
BUG在哪里调用 Categorical.min() 或 Categorical.max() 会引发一个麻木的异常 (GH30227 )
The following methods now also correctly output values for unobserved categories when called through groupby(..., observed=False) (GH17605) * core.groupby.SeriesGroupBy.count() * core.groupby.SeriesGroupBy.size() * core.groupby.SeriesGroupBy.nunique() * core.groupby.SeriesGroupBy.nth()

类似日期的#

窃听 Series.__setitem__() 铸件不正确 np.timedelta64("NaT") 至 np.datetime64("NaT") 当插入到 Series WITH DATETIME 64 dtype (GH27311 )
窃听 Series.dt() 基础数据为只读时的属性查找 (GH27529 )
窃听 HDFStore.__getitem__ 错误读取在Python2中创建的tz属性 (GH26443 )
Bug in to_datetime() where passing arrays of malformed str with errors="coerce" could incorrectly lead to raising ValueError (GH28299)
窃听 core.groupby.SeriesGroupBy.nunique() 哪里 NaT 值干扰了唯一值的计数 (GH27951 )
Bug in Timestamp subtraction when subtracting a Timestamp from a np.datetime64 object incorrectly raising TypeError (GH28286)
Addition and subtraction of integer or integer-dtype arrays with Timestamp will now raise NullFrequencyError instead of ValueError (GH28268)
窃听 Series 和 DataFrame 整数数据类型无法引发 TypeError 在添加或减去 np.datetime64 对象 (GH28080 )
窃听 Series.astype() ， Index.astype() ，以及 DataFrame.astype() 处理不力 NaT 转换为整型数据类型时 (GH28492 )
窃听 Week 使用 weekday 错误地提高 AttributeError 而不是 TypeError 在添加或减去无效类型时 (GH28530 )
Bug in DataFrame arithmetic operations when operating with a Series with dtype 'timedelta64[ns]' (GH28049)
窃听 core.groupby.generic.SeriesGroupBy.apply() 加薪 ValueError 原始DataFrame中的列是DateTime并且列标签不是标准整数时 (GH28247 )
窃听 pandas._config.localization.get_locales() 其中 locales -a 将区域设置列表编码为WINDOWS-1252 (GH23638 ， GH24760 ， GH27368 )
窃听 Series.var() 未能筹集到 TypeError 使用调用时 timedelta64[ns] 数据类型 (GH28289 )
Bug in DatetimeIndex.strftime() and Series.dt.strftime() where NaT was converted to the string 'NaT' instead of np.nan (GH29578)
Bug in masking datetime-like arrays with a boolean mask of an incorrect length not raising an IndexError (GH30308)
窃听 Timestamp.resolution 作为属性而不是类属性 (GH29910 )
Bug in pandas.to_datetime() when called with None raising TypeError instead of returning NaT (GH30011)
窃听 pandas.to_datetime() 失败的原因 deques 在使用时 cache=True (默认设置) (GH29403 )
Bug in Series.item() with datetime64 or timedelta64 dtype, DatetimeIndex.item(), and TimedeltaIndex.item() returning an integer instead of a Timestamp or Timedelta (GH30175)
窃听 DatetimeIndex 在添加未优化的 DateOffset 错误地删除时区信息 (GH30336 )
窃听 DataFrame.drop() 如果尝试从DatetimeIndex中删除不存在的值，则会产生令人困惑的错误消息 (GH30399 )
窃听 DataFrame.append() 将消除对新数据的时区感知 (GH30238 )
窃听 Series.cummin() 和 Series.cummax() 可识别时区的dtype错误地丢弃其时区 (GH15553 )
窃听 DatetimeArray ， TimedeltaArray ，以及 PeriodArray 其中原地加法和减法实际上并不在原地运行 (GH24115 )
Bug in pandas.to_datetime() when called with Series storing IntegerArray raising TypeError instead of returning Series (GH30050)
Bug in date_range() with custom business hours as freq and given number of periods (GH30593)
窃听 PeriodIndex 与将整数错误地强制转换为 Period 对象，与 Period 比较行为 (GH30722 )
窃听 DatetimeIndex.insert() 提高一名 ValueError 而不是一个 TypeError 尝试插入支持时区的 Timestamp 进入时区-幼稚 DatetimeIndex ，反之亦然 (GH30806 )

Timedelta#

在减法中出现错误 TimedeltaIndex 或 TimedeltaArray 从一个 np.datetime64 对象 (GH29558 )

时区#

数字#

窃听 DataFrame.quantile() 使用零列 DataFrame 错误地提高 (GH23925 )
DataFrame Flex不等式比较方法 (DataFrame.lt() ， DataFrame.le() ， DataFrame.gt() ， DataFrame.ge() )以及对象数据类型和 complex 未能筹集到的记项 TypeError 就像他们的 Series 对应方 (GH28079 )
窃听 DataFrame 逻辑运算 (& ， | ， ^ )不匹配 Series 通过填充NA值实现的行为 (GH28741 )
窃听 DataFrame.interpolate() 其中，按名称指定轴在赋值之前引用变量 (GH29142 )
窃听 Series.var() 不使用不传递ddof参数的可为空的整型dtype序列计算正确的值 (GH29128 )
改进了使用时的错误消息 frac >1和 replace =False (GH27451 )
数字索引中的错误导致可以实例化 Int64Index ， UInt64Index ，或 Float64Index 具有无效的数据类型(例如，类似DateTime) (GH29539 )
窃听 UInt64Index 属性中的值从列表构造时的精度损失 np.uint64 范围 (GH29526 )
窃听 NumericIndex 中的整数时导致索引失败的 np.uint64 使用的范围是 (GH28023 )
Bug in NumericIndex construction that caused UInt64Index to be casted to Float64Index when integers in the np.uint64 range were used to index a DataFrame (GH28279)
窃听 Series.interpolate() 在使用带未排序索引的方法=`index`时，以前会返回错误的结果。 (GH21037 )
Bug in DataFrame.round() where a DataFrame with a CategoricalIndex of IntervalIndex columns would incorrectly raise a TypeError (GH30063)
窃听 Series.pct_change() 和 DataFrame.pct_change() 当存在重复的索引时 (GH30463 )
窃听 DataFrame 累计操作(例如，Cumsum、Cummax)错误地转换为对象数据类型 (GH19296 )
窃听 diff 丢失扩展类型的数据类型 (GH30889 )
窃听 DataFrame.diff 举起一个 IndexError 当其中一列是可以为空的整数数据类型时 (GH30967 )

转换#

字符串#

Calling Series.str.isalnum() (and other "ismethods") on an empty Series would return an object dtype instead of bool (GH29624)

间隔#

Bug in IntervalIndex.get_indexer() where a Categorical or CategoricalIndex target would incorrectly raise a TypeError (GH30063)
Bug in pandas.core.dtypes.cast.infer_dtype_from_scalar where passing pandas_dtype=True did not infer IntervalDtype (GH30337)
Bug in Series constructor where constructing a Series from a list of Interval objects resulted in object dtype instead of IntervalDtype (GH23563)
Bug in IntervalDtype where the kind attribute was incorrectly set as None instead of "O" (GH30568)
窃听 IntervalIndex ， IntervalArray ，以及 Series 使用相等性比较不正确的区间数据 (GH24112 )

标引#

使用反向切片器进行赋值时出现错误 (GH26939 )
窃听 DataFrame.explode() 将在索引中存在重复项的情况下复制帧 (GH28010 )
Bug in reindexing a PeriodIndex() with another type of index that contained a Period (GH28323) (GH28337)
固定列通过的赋值 .loc 具有NumPy非ns日期时间类型 (GH27395 )
窃听 Float64Index.astype() 哪里 np.inf 在强制转换为整型数据类型时未正确处理 (GH28475 )
Index.union() 当左侧包含重复项时可能会失败 (GH28257 )
使用索引时出现错误 .loc 其中，索引是一个 CategoricalIndex 使用非字符串类别不起作用 (GH17569 ， GH30225 )
Index.get_indexer_non_unique() 可能会失败，原因是 TypeError 在某些情况下，例如在字符串索引中搜索整数时 (GH28257 )
Bug in Float64Index.get_loc() incorrectly raising TypeError instead of KeyError (GH29189)
窃听 DataFrame.loc() 在单行DataFrame中设置类别值时具有错误的数据类型 (GH25495 )
MultiIndex.get_loc() 当输入包含缺少的值时，找不到缺少的值 (GH19132 )
Bug in Series.__setitem__() incorrectly assigning values with boolean indexer when the length of new data matches the number of True values and new data is not a Series or an np.array (GH30567)
Bug in indexing with a PeriodIndex incorrectly accepting integers representing years, use e.g. ser.loc["2007"] instead of ser.loc[2007] (GH30763)

丢失#

MultiIndex#

的构造函数 MultiIndex 验证给定的 sortorder 与实际情况兼容 lexsort_depth 如果 verify_integrity 参数为 True (默认设置) (GH28735 )
系列和多重索引 .drop 使用 MultiIndex 如果标高中未给出标签，则引发异常 (GH8594 )

IO#

read_csv() 现在，在使用Python CSV引擎时接受二进制模式文件缓冲区 (GH23779 )
窃听 DataFrame.to_json() 其中使用元组作为列或索引值，并使用 orient="columns" 或 orient="index" 会产生无效的JSON (GH20500 )
改进无穷大解析。 read_csv() 现在解释 Infinity ， +Infinity ， -Infinity 作为浮点值 (GH10065 )
窃听 DataFrame.to_csv() 其中值被截断时， na_rep 比文本输入数据短。 (GH25099 )
窃听 DataFrame.to_string() 其中使用显示选项截断值，而不是输出完整内容 (GH9784 )
Bug in DataFrame.to_json() where a datetime column label would not be written out in ISO format with orient="table" (GH28130)
窃听 DataFrame.to_parquet() 在哪些情况下写入GCS会失败 engine='fastparquet' 如果该文件不存在 (GH28326 )
窃听 read_hdf() 在引发异常时关闭未开门的商店 (GH28699 )
窃听 DataFrame.read_json() 在哪里使用 orient="index" 不会维持秩序 (GH28557 )
窃听 DataFrame.to_html() 其中的长度是 formatters 参数未验证 (GH28469 )
窃听 DataFrame.read_excel() 使用 engine='ods' 什么时候 sheet_name 参数引用了不存在的工作表 (GH27676 )
窃听 pandas.io.formats.style.Styler() 浮点值的格式不能正确显示小数 (GH13257 )
窃听 DataFrame.to_html() 在使用时 formatters=<list> 和 max_cols 在一起。 (GH25955 )
Bug in Styler.background_gradient() not able to work with dtype Int64 (GH28869)
窃听 DataFrame.to_clipboard() 它在IPython中不能可靠地工作 (GH22707 )
Bug in read_json() where default encoding was not set to utf-8 (GH29565)
窃听 PythonParser 其中，字符串和字节在处理十进制字段时混合在一起 (GH29650 )
read_gbq() 现在接受 progress_bar_type 在数据下载时显示进度条。 (GH29857 )
Bug in pandas.io.json.json_normalize() where a missing value in the location specified by record_path would raise a TypeError (GH30148)
read_excel() 现在接受二进制数据 (GH15914 )
窃听 read_csv() 其中编码处理仅限于字符串 utf-16 对于C引擎 (GH24130 )

标绘#

窃听 Series.plot() 无法绘制布尔值 (GH23719 )
窃听 DataFrame.plot() 没有行时无法打印 (GH27758 )
窃听 DataFrame.plot() 在同一轴上绘制多个系列时产生错误的图例标记 (GH18222 )
窃听 DataFrame.plot() 什么时候 kind='box' 数据包含日期时间或时间增量数据。这些类型现在会自动删除 (GH22799 )
窃听 DataFrame.plot.line() 和 DataFrame.plot.area() 在x轴上生成错误的xlim (GH27686 ， GH25160 ， GH24784 )
Bug where DataFrame.boxplot() would not accept a color parameter like DataFrame.plot.box() (GH26214)
Bug in the xticks argument being ignored for DataFrame.plot.bar() (GH14119)
set_option() 现在验证提供给 'plotting.backend' 在设置选项时而不是在创建绘图时实施后端 (GH28163 )
DataFrame.plot() 现在允许一个 backend 关键字参数，允许在一个会话中在后端之间进行更改 (GH28619 )。
非颜色样式的颜色验证错误引发错误 (GH29122 )。
允许 DataFrame.plot.scatter() 打印的步骤 objects 和 datetime 键入数据 (GH18755 ， GH30391 )
窃听 DataFrame.hist() ， xrot=0 不能与配合使用 by 和次要情节 (GH30288 )。

分组依据/重采样/滚动#

Bug in core.groupby.DataFrameGroupBy.apply() only showing output from a single group when function returns an Index (GH28652)
窃听 DataFrame.groupby() 具有多个组，其中 IndexError 如果任何组包含所有NA值，则将引发 (GH20519 )
Bug in pandas.core.resample.Resampler.size() and pandas.core.resample.Resampler.count() returning wrong dtype when used with an empty Series or DataFrame (GH28427)
Bug in DataFrame.rolling() not allowing for rolling over datetimes when axis=1 (GH28192)
窃听 DataFrame.rolling() 不允许滚动多指标级别 (GH15584 )。
窃听 DataFrame.rolling() 不允许在单调递减时间索引上滚动 (GH19248 )。
Bug in DataFrame.groupby() not offering selection by column name when axis=1 (GH27614)
窃听 core.groupby.DataFrameGroupby.agg() 无法将lambda函数与命名聚合一起使用 (GH27519 )
窃听 DataFrame.groupby() 按分类列分组时丢失列名信息 (GH28787 )
删除由于中命名聚合中的输入函数重复而引发的错误 DataFrame.groupby() 和 Series.groupby() 。以前，如果对同一列应用相同的函数，则会引发错误；现在，如果新分配的名称不同，则允许错误。 (GH28426 )
core.groupby.SeriesGroupBy.value_counts() 将能够处理此案，即使 Grouper 使组成为空组 (GH28479 )
窃听 core.window.rolling.Rolling.quantile() 忽略 interpolation 在Groupby中使用时的关键字参数 (GH28779 )
窃听 DataFrame.groupby() 哪里 any ， all ， nunique 和转换函数将错误地处理重复的列标签 (GH21668 )
窃听 core.groupby.DataFrameGroupBy.agg() 使用支持时区的DateTime64列将结果错误地转换为原始数据类型 (GH29641 )
窃听 DataFrame.groupby() 使用AXIS=1并具有单级列索引时 (GH30208 )
窃听 DataFrame.groupby() 在轴上使用n唯一=1时 (GH30253 )
窃听 GroupBy.quantile() 具有多个类似列表的Q值和整型列名 (GH30289 )
Bug in GroupBy.pct_change() and core.groupby.SeriesGroupBy.pct_change() causes TypeError when fill_method is None (GH30463)
窃听 Rolling.count() 和 Expanding.count() 论证在以下位置 min_periods 被忽略了 (GH26996 )

重塑#

Bug in DataFrame.apply() that caused incorrect output with empty DataFrame (GH28202, GH21959)
窃听 DataFrame.stack() 创建多索引时未正确处理非唯一索引 (GH28301 )
Bug in pivot_table() not returning correct type float when margins=True and aggfunc='mean' (GH24893)
虫虫 merge_asof() 无法使用 datetime.timedelta 为 tolerance 科瓦格 (GH28098 )
窃听 merge() ，未使用多重索引正确追加后缀 (GH28518 )
qcut() 和 cut() 现在处理布尔输入 (GH20303 )
Fix to ensure all int dtypes can be used in merge_asof() when using a tolerance value. Previously every non-int64 type would raise an erroneous MergeError (GH28870).
中更好的错误消息 get_dummies() 什么时候 columns 不是一个类似列表的值 (GH28383 )
窃听 Index.join() 这会导致不匹配的无限递归错误 MultiIndex 命名命令。 (GH25760 ， GH28956 )
Bug Series.pct_change() where supplying an anchored frequency would throw a ValueError (GH28664)
BUG在哪里 DataFrame.equals() 在某些情况下，当两个DataFrame具有不同顺序的相同列时，会错误地返回True (GH28839 )
窃听 DataFrame.replace() 这导致非数字替换者的数据类型不受尊重 (GH26632 )
Bug in melt() where supplying mixed strings and numeric values for id_vars or value_vars would incorrectly raise a ValueError (GH29718)
数据类型现在可以在转置 DataFrame 其中，每列都是相同的扩展数据类型 (GH30091 )
窃听 merge_asof() 在TZ感知上进行合并 left_index 和 right_on 一篇了解TZ的专栏 (GH29864 )
Improved error message and docstring in cut() and qcut() when labels=True (GH13318)
BUG在丢失中 fill_na 参数设置为 DataFrame.unstack() 带有关卡列表 (GH30740 )

稀疏#

窃听 SparseDataFrame 算术运算错误地将输入转换为浮点数 (GH28107 )
窃听 DataFrame.sparse 返回一个 Series 当时有一列名为 sparse 而不是访问者 (GH30758 )
固定的 operator.xor() 使用布尔型数据类型 SparseArray 。现在返回稀疏结果，而不是对象数据类型 (GH31025 )

ExtensionArray#

窃听 arrays.PandasArray 设置标量字符串时 (GH28118 ， GH28150 )。
无法将可为空的整数与字符串进行比较的错误 (GH28930 )
BUG在哪里 DataFrame 构造函数已引发 ValueError 使用类似列表的数据和 dtype 指定 (GH30280 )

其他#

Trying to set the display.precision, display.max_rows or display.max_columns using set_option() to anything but a None or a positive int will raise a ValueError (GH23348)
使用 DataFrame.replace() 嵌套字典中的重叠键将不再引发，现在与平面字典的行为相匹配 (GH27660 )
DataFrame.to_csv() 和 Series.to_csv() 现在支持DICTS作为 compression 带关键字的参数 'method' 作为压缩方法，而当压缩方法为 'zip' 。 (GH26023 )
Bug in Series.diff() where a boolean series would incorrectly raise a TypeError (GH17294)
Series.append() will no longer raise a TypeError when passed a tuple of Series (GH28410)
修复调用时损坏的错误消息 pandas.libs._json.encode() 在0d数组上 (GH18878 )
反引号引用 DataFrame.query() 和 DataFrame.eval() 现在还可以用于使用无效的标识符，如以数字开头的名称、是Python关键字或使用单字符运算符的名称。 (GH27017 )
窃听 pd.core.util.hashing.hash_pandas_object 其中包含元组的数组被错误地视为不可哈希 (GH28969 )
窃听 DataFrame.append() 这引发了 IndexError 追加空列表时 (GH28769 )
修复 AbstractHolidayCalendar 返回2030年后年份的正确结果(现在最高可达2200) (GH27790 )
Fixed IntegerArray returning inf rather than NaN for operations dividing by 0 (GH27398)
Fixed pow operations for IntegerArray when the other value is 0 or 1 (GH29997)
窃听 Series.count() 如果启用了USE_INF_AS_NA，则引发 (GH29478 )
Bug in Index where a non-hashable name could be set without raising TypeError (GH29069)
窃听 DataFrame 构造函数在传递2D ndarray 和扩展数据类型 (GH12513 )
窃听 DataFrame.to_csv() 当向系列提供 dtype="string" 和一个 na_rep ，即 na_rep 被截断为2个字符。 (GH29975 )
BUG在哪里 DataFrame.itertuples() 会错误地确定命名的元组是否可以用于255列的数据帧 (GH28282 )
处理嵌套的NumPy object 中的阵列 testing.assert_series_equal() 适用于扩展数组实现 (GH30841 )
窃听 Index 构造函数错误地允许二维输入数组 (GH13601 ， GH27125 )

贡献者#

共有308人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Aaditya Panikath +
Abdullah İhsan Seçer
Abhijeet Krishnan +
Adam J. Stewart
Adam Klaum +
Addison Lynch
Aivengoe +
Alastair James +
Albert Villanova del Moral
Alex Kirko +
Alfredo Granja +
Allen Downey
Alp Arıbal +
Andreas Buhr +
Andrew Munch +
Andy
Angela Ambroz +
Aniruddha Bhattacharjee +
Ankit Dhankhar +
Antonio Andraues Jr +
Arda Kosar +
Asish Mahapatra +
Austin Hackett +
Avi Kelman +
AyowoleT +
Bas Nijholt +
Ben Thayer
Bharat Raghunathan
Bhavani Ravi
Bhuvana KA +
Big Head
Blake Hawkins +
Bobae Kim +
Brett Naul
Brian Wignall
Bruno P. Kinoshita +
Bryant Moscon +
Cesar H +
Chris Stadler
Chris Zimmerman +
Christopher Whelan
Clemens Brunner
Clemens Tolboom +
Connor Charles +
Daniel Hähnke +
Daniel Saxton
Darin Plutchok +
Dave Hughes
David Stansby
DavidRosen +
Dean +
Deepan Das +
Deepyaman Datta
DorAmram +
Dorothy Kabarozi +
Drew Heenan +
Eliza Mae Saret +
Elle +
Endre Mark Borza +
Eric Brassell +
Eric Wong +
Eunseop Jeong +
Eyden Villanueva +
Felix Divo
ForTimeBeing +
Francesco Truzzi +
Gabriel Corona +
Gabriel Monteiro +
Galuh Sahid +
Georgi Baychev +
Gina
GiuPassarelli +
Grigorios Giannakopoulos +
Guilherme Leite +
Guilherme Salomé +
Gyeongjae Choi +
Harshavardhan Bachina +
Harutaka Kawamura +
Hassan Kibirige
Hielke Walinga
Hubert
Hugh Kelley +
Ian Eaves +
Ignacio Santolin +
Igor Filippov +
Irv Lustig
Isaac Virshup +
Ivan Bessarabov +
JMBurley +
Jack Bicknell +
Jacob Buckheit +
Jan Koch
Jan Pipek +
Jan Škoda +
Jan-Philip Gehrcke
Jasper J.F. van den Bosch +
Javad +
Jeff Reback
Jeremy Schendel
Jeroen Kant +
Jesse Pardue +
Jethro Cao +
Jiang Yue
Jiaxiang +
Jihyung Moon +
Jimmy Callin
Jinyang Zhou +
Joao Victor Martinelli +
Joaq Almirante +
John G Evans +
John Ward +
Jonathan Larkin +
Joris Van den Bossche
Josh Dimarsky +
Joshua Smith +
Josiah Baker +
Julia Signell +
Jung Dong Ho +
Justin Cole +
Justin Zheng
Kaiqi Dong
Karthigeyan +
Katherine Younglove +
Katrin Leinweber
Kee Chong Tan +
Keith Kraus +
Kevin Nguyen +
Kevin Sheppard
Kisekka David +
Koushik +
Kyle Boone +
Kyle McCahill +
Laura Collard, PhD +
LiuSeeker +
Louis Huynh +
Lucas Scarlato Astur +
Luiz Gustavo +
Luke +
Luke Shepard +
MKhalusova +
Mabel Villalba
Maciej J +
Mak Sze Chun
Manu NALEPA +
Marc
Marc Garcia
Marco Gorelli +
Marco Neumann +
Martin Winkel +
Martina G. Vilas +
Mateusz +
Matthew Roeschke
Matthew Tan +
Max Bolingbroke
Max Chen +
MeeseeksMachine
Miguel +
MinGyo Jung +
Mohamed Amine ZGHAL +
Mohit Anand +
MomIsBestFriend +
Naomi Bonnin +
Nathan Abel +
Nico Cernek +
Nigel Markey +
Noritada Kobayashi +
Oktay Sabak +
Oliver Hofkens +
Oluokun Adedayo +
Osman +
Oğuzhan Öğreden +
Pandas Development Team +
Patrik Hlobil +
Paul Lee +
Paul Siegel +
Petr Baev +
Pietro Battiston
Prakhar Pandey +
Puneeth K +
Raghav +
Rajat +
Rajhans Jadhao +
Rajiv Bharadwaj +
Rik-de-Kort +
Roei.r
Rohit Sanjay +
Ronan Lamy +
Roshni +
Roymprog +
Rushabh Vasani +
Ryan Grout +
Ryan Nazareth
Samesh Lakhotia +
Samuel Sinayoko
Samyak Jain +
Sarah Donehower +
Sarah Masud +
Saul Shanabrook +
Scott Cole +
SdgJlbl +
Seb +
Sergei Ivko +
Shadi Akiki
Shorokhov Sergey
Siddhesh Poyarekar +
Sidharthan Nair +
Simon Gibbons
Simon Hawkins
Simon-Martin Schröder +
Sofiane Mahiou +
Sourav kumar +
Souvik Mandal +
Soyoun Kim +
Sparkle Russell-Puleri +
Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి)
Stuart Berg +
Sumanau Sareen
Szymon Bednarek +
Tambe Tabitha Achere +
Tan Tran
Tang Heyi +
Tanmay Daripa +
Tanya Jain
Terji Petersen
Thomas Li +
Tirth Jain +
Tola A +
Tom Augspurger
Tommy Lynch +
Tomoyuki Suzuki +
Tony Lorenzo
Unprocessable +
Uwe L. Korn
Vaibhav Vishal
Victoria Zdanovskaya +
Vijayant +
Vishwak Srinivasan +
WANG Aiyong
Wenhuan
Wes McKinney
Will Ayd
Will Holmgren
William Ayd
William Blan +
Wouter Overmeire
Wuraola Oyewusi +
YaOzI +
Yash Shukla +
Yu Wang +
Yusei Tahara +
alexander135 +
alimcmaster1
avelineg +
bganglia +
bolkedebruin
bravech +
chinhwee +
cruzzoe +
dalgarno +
daniellebrown +
danielplawrence
est271 +
francisco souza +
ganevgv +
garanews +
gfyoung
h-vetinari
hasnain2808 +
ianzur +
jalbritt +
jbrockmendel
jeschwar +
jlamborn324 +
joy-rosie +
kernc
killerontherun1
krey +
lexy-lixinyu +
lucyleeow +
lukasbk +
maheshbapatu +
mck619 +
nathalier
naveenkaushik2504 +
nlepleux +
nrebena
ohad83 +
pilkibun
pqzx +
proost +
pv8493013j +
qudade +
rhstanton +
rmunjal29 +
sangarshanan +
sardonick +
saskakarsi +
shaido987 +
ssikdar1
steveayers124 +
tadashigaki +
timcera +
tlaytongoogle +
tobycheese
tonywu1999 +
tsvikas +
yogendrasoni +
zys5945 +

1.0.1中的新特性(2020年2月5日)

0.25.3中的新特性(2019年10月31日)

1.0.0中的新特性(2020年1月29日)#

新的弃用策略#

增强#

在中使用Numba rolling.apply 和 expanding.apply#

定义滚动操作的自定义窗口#

转换为降价#

试验性新功能#

实验性的 NA 标量表示缺少的值#

专用字符串数据类型#

支持缺少值的布尔数据类型#

方法 convert_dtypes 简化支持的扩展数据类型的使用#

其他增强功能#

向后不兼容的API更改#

避免使用来自 MultiIndex.levels#

新的版本 IntervalArray#

DataFrame.rename 现在只接受一个位置参数#

扩展的详细信息输出 DataFrame#

pandas.array() 推理变化#

arrays.IntegerArray now uses pandas.NA#

arrays.IntegerArray comparisons return arrays.BooleanArray#

默认情况下， Categorical.min() 现在返回最小值，而不是np.nan#

默认数据类型为空 pandas.Series#

重采样操作的结果数据类型推断更改#

提高了Python的最低版本#

提高了依赖项的最低版本#

构建更改#

其他API更改#

文档改进#

不推荐使用#

删除先前版本的弃用/更改#

性能改进#

错误修复#

直截了当的#

类似日期的#

Timedelta#

时区#

数字#

转换#

字符串#

间隔#

标引#

丢失#

MultiIndex#

IO#

标绘#

分组依据/重采样/滚动#

重塑#

稀疏#

ExtensionArray#

其他#

贡献者#

在中使用Numba `rolling.apply` 和 `expanding.apply`#

实验性的 `NA` 标量表示缺少的值#

方法 `convert_dtypes` 简化支持的扩展数据类型的使用#

避免使用来自 `MultiIndex.levels`#

新的版本 `IntervalArray`#

`DataFrame.rename` 现在只接受一个位置参数#

扩展的详细信息输出 `DataFrame`#

`pandas.array()` 推理变化#

`arrays.IntegerArray` now uses `pandas.NA`#

`arrays.IntegerArray` comparisons return `arrays.BooleanArray`#

默认情况下， `Categorical.min()` 现在返回最小值，而不是np.nan#

默认数据类型为空 `pandas.Series`#