1.4.0中的新特性(2022年1月22日)#

这些是Pandas1.4.0中的变化。看见发行说明获取完整的更改日志，包括其他版本的Pandas。

增强#

改进了警告消息#

此前，警告消息可能指向大Pandas类库内的线条。运行脚本 setting_with_copy_warning.py

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})
df[:2].loc[:, 'a'] = 5

Pandas1.3的结果是：：

.../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

这使得很难确定警告是从哪里生成的。现在，Pandas将检查调用堆栈，报告导致警告的Pandas库外的第一行。上述脚本的输出现在为：

setting_with_copy_warning.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

索引可以容纳任意ExtensionArray#

到目前为止，通过一个习俗 ExtensionArray 至 pd.Index 会将数组强制转换为 object 数据类型。现在 Index 可以直接容纳任意扩展阵列 (GH43930 )。

以前的行为 ：

In [1]: arr = pd.array([1, 2, pd.NA])

In [2]: idx = pd.Index(arr)

在过去的行为中， idx 将是对象数据类型：

以前的行为 ：

In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')

使用新的行为，我们保留了原始的dtype：

新行为 ：

In [3]: idx
Out[3]: Index([1, 2, <NA>], dtype='Int64')

这方面的一个例外是 SparseArray ，这将继续转换为麻木dtype，直到Pandas2.0。此时，它将像其他ExtensionArray一样保留其数据类型。

造型师#

Styler 在1.4.0中得到了进一步的开发。已对以下方面进行了一般性改进：

添加了索引的样式和格式，具有 Styler.apply_index() ， Styler.applymap_index() 和 Styler.format_index() 。它们反映了已经用来设置数据值样式和格式的方法的签名，并同时适用于HTML、LaTeX和Excel格式 (GH41893 ， GH43101 ， GH41993 ， GH41995 )

The new method Styler.hide() deprecates Styler.hide_index() and Styler.hide_columns() (GH43758)

关键字参数 level 和 names 已添加到 Styler.hide() (并隐含到不推荐使用的方法 Styler.hide_index() 和 Styler.hide_columns() )对多个索引和索引名称的可见性进行额外控制 (GH25475 ， GH43404 ， GH43346 )

这个 Styler.export() 和 Styler.use() 已更新，以解决v1.2.0和v1.3.0中添加的所有功能 (GH40675 )

Global options under the category pd.options.styler have been extended to configure default Styler properties which address formatting, encoding, and HTML and LaTeX rendering. Note that formerly Styler relied on display.html.use_mathjax, which has now been replaced by styler.html.mathjax (GH41395)

Validation of certain keyword arguments, e.g. caption (GH43368)

如下所述的各种错误修复

此外，还对特定于HTML的呈现进行了特定增强：

Styler.bar() introduces additional arguments to control alignment and display (GH26070, GH36419), and it also validates the input arguments width and height (GH42511)

Styler.to_html() introduces keyword arguments sparse_index, sparse_columns, bold_headers, caption, max_rows and max_columns (GH41946, GH43149, GH42972)

Styler.to_html() 作为性能增强，省略隐藏表元素的CSSStyle规则 (GH43619 )

现在，无需替换字符串即可直接指定定制的css类 (GH43686 )

能够通过新的 hyperlinks 格式化关键字参数 (GH45058 )

还有一些特定于LaTeX的增强功能：

Styler.to_latex() 引入关键字参数 environment ，它还允许通过单独的JJIA2模板提供特定的“可长的”条目 (GH41866 )

LaTeX现在可以实现朴素的稀疏化，而不需要包括多行包装 (GH43369 )

克莱恩 添加了对 MultiIndex 通过关键字参数进行行稀疏 (GH45138 )

基于PYARROW的新型CSV引擎的多线程CSV读取#

pandas.read_csv() 现在接受 engine="pyarrow" (至少需要 pyarrow 1.0.1)作为参数，允许在安装了yarrow的多核计算机上进行更快的CSV解析。请参阅 I/O docs 了解更多信息。 (GH23697 ， GH43706 )

用于滚动和展开窗口的排名函数#

已添加 rank 函数用于 Rolling 和 Expanding 。新功能支持 method ， ascending ，以及 pct 旗帜： DataFrame.rank() 。这个 method 论据支持 min ， max ，以及 average 排名方法。示例：

In [4]: s = pd.Series([1, 4, 2, 3, 5, 3])

In [5]: s.rolling(3).rank()
Out[5]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.5
Length: 6, dtype: float64

In [6]: s.rolling(3).rank(method="max")
Out[6]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    2.0
Length: 6, dtype: float64

按位置分组索引#

现在可以指定相对于每组末端的位置范围。

否定的论据 GroupBy.head() 和 GroupBy.tail() 现在可以正常工作，并分别产生相对于每个组的结束和开始的范围。以前，负参数返回空框。

In [7]: df = pd.DataFrame([["g", "g0"], ["g", "g1"], ["g", "g2"], ["g", "g3"],
   ...:                    ["h", "h0"], ["h", "h1"]], columns=["A", "B"])
   ...: 

In [8]: df.groupby("A").head(-1)
Out[8]: 
   A   B
0  g  g0
1  g  g1
2  g  g2
4  h  h0

[4 rows x 2 columns]

GroupBy.nth() 现在接受整数和切片的切片或列表。

In [9]: df.groupby("A").nth(slice(1, -1))
Out[9]: 
    B
A    
g  g1
g  g2

[2 rows x 1 columns]

In [10]: df.groupby("A").nth([slice(None, 1), slice(-1, None)])
Out[10]: 
    B
A    
g  g0
g  g3
h  h0
h  h1

[4 rows x 1 columns]

GroupBy.nth() 现在接受索引表示法。

In [11]: df.groupby("A").nth[1, -1]
Out[11]: 
    B
A    
g  g1
g  g3
h  h1

[3 rows x 1 columns]

In [12]: df.groupby("A").nth[1:-1]
Out[12]: 
    B
A    
g  g1
g  g2

[2 rows x 1 columns]

In [13]: df.groupby("A").nth[:1, -1:]
Out[13]: 
    B
A    
g  g0
g  g3
h  h0
h  h1

[4 rows x 1 columns]

DataFrame.from_dict和DataFrame.to_dict有新的 `'tight'` 选项#

一个新的 'tight' 保留的词典格式 MultiIndex 条目和名称现在可通过 DataFrame.from_dict() 和 DataFrame.to_dict() 方法并可与标准一起使用 json 库，以产生紧密的表示 DataFrame 对象 (GH4889 )。

In [14]: df = pd.DataFrame.from_records(
   ....:     [[1, 3], [2, 4]],
   ....:     index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")],
   ....:                                     names=["n1", "n2"]),
   ....:     columns=pd.MultiIndex.from_tuples([("x", 1), ("y", 2)],
   ....:                                       names=["z1", "z2"]),
   ....: )
   ....: 

In [15]: df
Out[15]: 
z1     x  y
z2     1  2
n1 n2      
a  b   1  3
   c   2  4

[2 rows x 2 columns]

In [16]: df.to_dict(orient='tight')
Out[16]: 
{'index': [('a', 'b'), ('a', 'c')],
 'columns': [('x', 1), ('y', 2)],
 'data': [[1, 3], [2, 4]],
 'index_names': ['n1', 'n2'],
 'column_names': ['z1', 'z2']}

其他增强功能#

concat() 将保留 attrs 当它对所有对象都相同时，丢弃 attrs 当他们不同的时候 (GH41828 )
DataFrameGroupBy 运营方式： as_index=False 现在正确地保留 ExtensionDtype 要分组的列的数据类型 (GH41373 )
Add support for assigning values to by argument in DataFrame.plot.hist() and DataFrame.plot.box() (GH15079)
Series.sample(), DataFrame.sample(), and GroupBy.sample() now accept a np.random.Generator as input to random_state. A generator will be more performant, especially with replace=False (GH38100)
Series.ewm() 和 DataFrame.ewm() 现在支持 method 参数的参数 'table' 选项，该选项对整个 DataFrame 。看见 Window Overview 以获得性能和功能优势 (GH42273 )
GroupBy.cummin() and GroupBy.cummax() now support the argument skipna (GH34047)
read_table() now supports the argument storage_options (GH39167)
DataFrame.to_stata() 和 StataWriter() 现在接受Keyword Only参数 value_labels 保存非分类列的标签 (GH38454 )
依赖于基于哈希表的算法的方法，例如 DataFrameGroupBy.value_counts() ， DataFrameGroupBy.count() 和 factorize() 复数的忽略虚分量 (GH17927 )
Add Series.str.removeprefix() and Series.str.removesuffix() introduced in Python 3.9 to remove pre-/suffixes from string-type Series (GH36944)
正在尝试写入丢失的父目录中的文件 DataFrame.to_csv() ， DataFrame.to_html() ， DataFrame.to_excel() ， DataFrame.to_feather() ， DataFrame.to_parquet() ， DataFrame.to_stata() ， DataFrame.to_json() ， DataFrame.to_pickle() ，以及 DataFrame.to_xml() 现在明确提到缺少父目录，对于 Series 对应方 (GH24306 )
Indexing with .loc and .iloc now supports Ellipsis (GH37750)
IntegerArray.all() ， IntegerArray.any() ， FloatingArray.any() ，以及 FloatingArray.all() 使用Kleene逻辑 (GH41967 )
Added support for nullable boolean and integer types in DataFrame.to_stata(), StataWriter, StataWriter117, and StataWriterUTF8 (GH40855)
DataFrame.__pos__() 和 DataFrame.__neg__() 现在保留 ExtensionDtype 数据类型 (GH43883 )
不能导入可选依赖项时引发的错误现在包含原始异常，以便于调查 (GH43882 )
Added ExponentialMovingWindow.sum() (GH13297)
Series.str.split() now supports a regex argument that explicitly specifies whether the pattern is a regular expression. Default is None (GH43563, GH32835, GH25549)
DataFrame.dropna() 现在接受单个标签作为 subset 以及类似于阵列的 (GH41021 )
Added DataFrameGroupBy.value_counts() (GH43564)
read_csv() 现在接受 callable 中的函数 on_bad_lines 什么时候 engine="python" 用于自定义处理坏行 (GH5686 )
ExcelWriter 论据 if_sheet_exists="overlay" 已添加选项 (GH40231 )
read_excel() 现在接受 decimal 参数，该参数允许用户在将字符串列分析为数字时指定小数点 (GH14403 )
GroupBy.mean() ， GroupBy.std() ， GroupBy.var() ，以及 GroupBy.sum() 现在支持 Numba 使用 engine 关键字 (GH43731 ， GH44862 ， GH44939 )
Timestamp.isoformat() 现在处理 timespec 从基础开始的参数 datetime 班级 (GH26131 )
NaT.to_numpy() dtype 论据现在受到尊重，所以 np.timedelta64 可以退还 (GH44460 )
新选项 display.max_dir_items 自定义添加到 Dataframe.__dir__() 并建议用于制表符完成 (GH37996 )
Added "Juneteenth National Independence Day" to USFederalHolidayCalendar (GH44574)
Rolling.var() ， Expanding.var() ， Rolling.std() ，以及 Expanding.std() 现在支持 Numba 使用 engine 关键字 (GH44461 )
Series.info() has been added, for compatibility with DataFrame.info() (GH5167)
Implemented IntervalArray.min() and IntervalArray.max(), as a result of which min and max now work for IntervalIndex, Series and DataFrame with IntervalDtype (GH44746)
UInt64Index.map() 现在保留了 dtype 在可能的情况下 (GH44609 )
read_json() 现在可以分析无符号长整型 (GH26068 )
DataFrame.take() 现在引发一个 TypeError 当传递索引器的标量时 (GH42875 )
is_list_like() now identifies duck-arrays as list-like unless .ndim == 0 (GH35131)
ExtensionDtype and ExtensionArray are now (de)serialized when exporting a DataFrame with DataFrame.to_json() using orient='table' (GH20612, GH44705)
添加对以下各项的支持 Zstandard 压缩到 DataFrame.to_pickle()/read_pickle() 和朋友们 (GH43925 )
DataFrame.to_sql() 现在返回一个 int 写入的行数 (GH23998 )

值得注意的错误修复#

这些错误修复可能会带来显著的行为变化。

日期字符串解析不一致#

这个 dayfirst 选项： to_datetime() 并不严格，这可能会导致令人惊讶的行为：

In [17]: pd.to_datetime(["31-12-2021"], dayfirst=False)
Out[17]: DatetimeIndex(['2021-12-31'], dtype='datetime64[ns]', freq=None)

现在，如果日期字符串不能按照给定的 dayfirst 当值为分隔日期字符串(例如 31-12-2012 )。

忽略具有空列或全NA列的串联中的数据类型#

在使用时 concat() 将两个或多个连接起来 DataFrame 对象，如果其中一个DataFrames为空或具有全NA值，则其数据类型为 有时候 查找连接的数据类型时忽略。现在，这些都是一致的 not 忽略 (GH43507 )。

In [18]: df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))

In [19]: df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))

In [20]: res = pd.concat([df1, df2])

以前，中的浮点数据类型 df2 将被忽略，因此结果数据类型将为 datetime64[ns] 。因此， np.nan 将被演绎成 NaT 。

以前的行为 ：

In [4]: res
Out[4]:
         bar
0 2013-01-01
1        NaT

现在，浮点dtype受到了尊重。由于这些DataFrame的公共数据类型是Object，因此 np.nan 被保留了下来。

新行为 ：

In [21]: res
Out[21]: 
                   bar
0  2013-01-01 00:00:00
1                  NaN

[2 rows x 1 columns]

NULL-在VALUE_COUNTS AND模式下，不再将值强制为NaN值#

Series.value_counts() 和 Series.mode() 不再强迫 None ， NaT 和其他空值设置为NaN值 np.object -dtype。此行为现在与 unique ， isin 和其他人 (GH42688 )。

In [22]: s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])

In [23]: res = s.value_counts(dropna=False)

以前，所有空值都被替换为NaN值。

以前的行为 ：

In [3]: res
Out[3]:
NaN     5
True    1
dtype: int64

现在，空值不再损坏。

新行为 ：

In [24]: res
Out[24]: 
None    3
NaT     2
True    1
Length: 3, dtype: int64

READ_CSV中的MANGLE_DUPE_COLS不再重命名与目标名称冲突的唯一列#

read_csv() 不再重命名与重复列的目标名称冲突的唯一列标签。跳过已有的列，即将下一个可用索引用于目标列名 (GH14704 )。

In [25]: import io

In [26]: data = "a,a,a.1\n1,2,3"

In [27]: res = pd.read_csv(io.StringIO(data))

以前，第二列被称为 a.1 ，而第三列也被重新命名为 a.1.1 。

以前的行为 ：

In [3]: res
Out[3]:
    a  a.1  a.1.1
0   1    2      3

现在，重命名检查是否 a.1 在更改第二列的名称时已存在并跳过此索引。第二列改为重命名为 a.2 。

新行为 ：

In [28]: res
Out[28]: 
   a  a.2  a.1
0  1    2    3

[1 rows x 3 columns]

UNSTACK和PIVOT_TABLE不再为超过int32限制的结果引发ValueError#

先前 DataFrame.pivot_table() 和 DataFrame.unstack() 会引发一个 ValueError 如果该操作可以产生的结果超过 2**31 - 1 元素。此操作现在会引发 errors.PerformanceWarning 取而代之的是 (GH26314 )。

以前的行为 ：

In [3]: df = DataFrame({"ind1": np.arange(2 ** 16), "ind2": np.arange(2 ** 16), "count": 0})
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
ValueError: Unstacked DataFrame is too big, causing int32 overflow

新行为 ：

In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
PerformanceWarning: The following operation may generate 4294967296 cells in the resulting pandas object.

Groupby。应用一致变换检测#

GroupBy.apply() 设计为灵活的，允许用户执行聚合、转换、筛选器，并将其与用户定义的函数一起使用，这些函数可能不属于任何这些类别。作为其中的一部分，Apply将尝试检测操作何时为转换，在这种情况下，结果将具有与输入相同的索引。为了确定操作是否是转换，Pandas会将输入的索引与结果的索引进行比较，并确定它是否发生了突变。在Pandas 1.3之前，不同的代码路径使用不同的“突变”定义：有些代码路径使用的是Python的 is 而其他人只会测试达到平等的程度。

这种不一致已经消除了，Pandas现在考验着平等。

In [29]: def func(x):
   ....:     return x.copy()
   ....: 

In [30]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})

In [31]: df
Out[31]: 
   a  b  c
0  1  3  5
1  2  4  6

[2 rows x 3 columns]

以前的行为 ：

In [3]: df.groupby(['a']).apply(func)
Out[3]:
     a  b  c
a
1 0  1  3  5
2 1  2  4  6

In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
     c
a b
1 3  5
2 4  6

在上面的示例中，第一个使用了Pandas使用的代码路径 is 并确定 func 不是转换，而第二个测试达到相等并确定 func 是一种转变。在第一种情况下，结果的索引与输入的索引不同。

新行为 ：

In [5]: df.groupby(['a']).apply(func)
Out[5]:
   a  b  c
0  1  3  5
1  2  4  6

In [6]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[6]:
     c
a b
1 3  5
2 4  6

现在，在这两种情况下，都确定 func 是一种转变。在每种情况下，结果都与输入具有相同的索引。

向后不兼容的API更改#

提高了Python的最低版本#

Pandas 1.4.0支持Python3.8及更高版本。

提高了依赖项的最低版本#

更新了一些受支持的依赖项的最低版本。如果已安装，我们现在需要：

套餐	最低版本	必填项	变化
钱币	1.18.5	X	X
皮兹	2020.1	X	X
Python-Dateutil	2.8.1	X	X
瓶颈	1.3.1		X
数字快递	2.7.1		X
最热(Dev)	6.0
Mypy(开发人员)	0.930		X

为 optional libraries 一般建议使用最新版本。下表列出了目前在整个Pandas发育过程中正在测试的每个库的最低版本。低于最低测试版本的可选库仍可运行，但不被视为受支持。

套餐	最低版本	变化
美味可口的汤	4.8.2	X
实木地板	0.4.0
FsSpec	0.7.4
Gcsf	0.6.0
Lxml	4.5.0	X
Matplotlib	3.3.2	X
Numba	0.50.1	X
OpenPyxl	3.0.3	X
Pandas-Gbq	0.14.0	X
绿箭侠	1.0.1	X
Pymysql	0.10.1	X
易燃物	3.6.1	X
S3FS	0.4.0
斯比	1.4.1	X
SQLALCHIZY	1.4.0	X
制表	0.8.7
XARRAY	0.15.1	X
Xlrd	2.0.1	X
Xlsx写入器	1.2.2	X
超大重量	1.3.0

看见依赖项和可选依赖项想要更多。

其他API更改#

Index.get_indexer_for() 不再接受关键字参数(除 target )；在过去，如果索引不是唯一的，这些将被静默忽略 (GH42310 )
的位置发生变化 min_rows 中的参数 DataFrame.to_string() 由于文档字符串中的更改 (GH44304 )
Reduction operations for DataFrame or Series now raising a ValueError when None is passed for skipna (GH44178)
read_csv() 和 read_html() 当其中一个标题行仅由 Unnamed: 列 (GH13054 )
更改了 name 中的几个假日的属性 USFederalHolidayCalendar 匹配 official federal holiday names 具体地说：
- “新年”获得了所有格撇号
- “总统日”成为“华盛顿诞辰日”
- “马丁·路德·金的生日”现在变成了“马丁·路德·金的生日”。
- “七四”现在是“独立日”
- “感恩节”现在成了“感恩节”
- “圣诞节”现在是“圣诞节”
- 增加“六月十九号国家独立日”

不推荐使用#

不推荐使用Int64Index、UInt64Index和Float64Index#

Int64Index ， UInt64Index 和 Float64Index 已经被弃用，转而支持基地 Index 类，并将在Pandas 2.0中删除 (GH43028 )。

要构造数字索引，可以使用基数 Index 类的新实例，而不是指定数据类型(这也适用于较旧的PANDA版本)：

# replace
pd.Int64Index([1, 2, 3])
# with
pd.Index([1, 2, 3], dtype="int64")

要检查索引对象的数据类型，可以替换 isinstance 检查与检查 dtype ：

# replace
isinstance(idx, pd.Int64Index)
# with
idx.dtype == "int64"

目前，为了保持向后兼容性，调用 Index 将继续返回 Int64Index ， UInt64Index 和 Float64Index 当给定数值数据时，但在将来， Index 将会被退还。

当前行为 ：

In [1]: pd.Index([1, 2, 3], dtype="int32")
Out [1]: Int64Index([1, 2, 3], dtype='int64')
In [1]: pd.Index([1, 2, 3], dtype="uint64")
Out [1]: UInt64Index([1, 2, 3], dtype='uint64')

未来行为 ：

In [3]: pd.Index([1, 2, 3], dtype="int32")
Out [3]: Index([1, 2, 3], dtype='int32')
In [4]: pd.Index([1, 2, 3], dtype="uint64")
Out [4]: Index([1, 2, 3], dtype='uint64')

不推荐使用DataFrame.append和Series.append#

DataFrame.append() 和 Series.append() 已被弃用，并将在未来的版本中删除。使用 pandas.concat() 取而代之的是 (GH35407 )。

不推荐使用的语法

In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
Out [1]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
0    1
1    2
0    3
1    4
dtype: int64

In [2]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [3]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [4]: df1.append(df2)
Out [4]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

推荐的语法

In [32]: pd.concat([pd.Series([1, 2]), pd.Series([3, 4])])
Out[32]: 
0    1
1    2
0    3
1    4
Length: 4, dtype: int64

In [33]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

In [34]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))

In [35]: pd.concat([df1, df2])
Out[35]: 
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

[4 rows x 2 columns]

其他不推荐使用的词#

Deprecated Index.is_type_compatible() (GH42113)
已弃用 method 中的参数 Index.get_loc() ，使用 index.get_indexer([label], method=...) 取而代之的是 (GH42269 )
不推荐在中处理整型键 Series.__setitem__() 作为位置，当索引是 Float64Index 不包含密钥，则一个 IntervalIndex 没有包含该键的条目，或者一个 MultiIndex 以领先的 Float64Index 级别不包含密钥 (GH33469 )
Deprecated treating numpy.datetime64 objects as UTC times when passed to the Timestamp constructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, use Timestamp(dt64).tz_localize("UTC").tz_convert(tz) (GH24559)
Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a MultiIndex (GH42351)
Creating an empty Series without a dtype will now raise a more visible FutureWarning instead of a DeprecationWarning (GH30017)
不推荐使用 kind 中的参数 Index.get_slice_bound() ， Index.slice_indexer() ，以及 Index.slice_locs() ；在未来通过的版本中 kind 将筹集 (GH42857 )
中不推荐删除讨厌的列 Rolling ， Expanding ，以及 EWM 聚合体 (GH42738 )
Deprecated Index.reindex() with a non-unique Index (GH42568)
Deprecated Styler.render() in favor of Styler.to_html() (GH42140)
Deprecated Styler.hide_index() and Styler.hide_columns() in favor of Styler.hide() (GH43758)
Deprecated passing in a string column label into times in DataFrame.ewm() (GH43265)
不推荐使用 include_start 和 include_end 中的参数 DataFrame.between_time() ；在未来通过的版本中 include_start 或 include_end 将筹集 (GH40245 )
不推荐使用 squeeze 参数为 read_csv() ， read_table() ，以及 read_excel() 。用户应该挤压 DataFrame 之后与 .squeeze("columns") 取而代之的是 (GH43242 )
不推荐使用 index 参数为 SparseArray 施工 (GH23089 )
不推荐使用 closed 中的参数 date_range() 和 bdate_range() 赞成 inclusive 参数；在将来的版本中传递 closed 将筹集 (GH40245 )
Deprecated Rolling.validate(), Expanding.validate(), and ExponentialMovingWindow.validate() (GH43665)
不推荐静默删除引发 TypeError 在……里面 Series.transform 和 DataFrame.transform 当与词典一起使用时 (GH43740 )
不推荐静默删除引发 TypeError ， DataError ，以及一些案例 ValueError 在……里面 Series.aggregate() ， DataFrame.aggregate() ， Series.groupby.aggregate() ，以及 DataFrame.groupby.aggregate() 与列表一起使用时 (GH43740 )
将时区感知值设置为时区感知值时，不推荐使用强制转换行为 Series 或 DataFrame 当时区不匹配时，列。在此之前，该类型转换为对象dtype。在将来的版本中，插入的值将转换为序列或列的现有时区 (GH37605 )
将带有不匹配时区的项传递到时，不推荐使用强制转换行为 DatetimeIndex.insert() ， DatetimeIndex.putmask() ， DatetimeIndex.where() DatetimeIndex.fillna() ， Series.mask() ， Series.where() ， Series.fillna() ， Series.shift() ， Series.replace() ， Series.reindex() (及 DataFrame 列类似物)。在过去，这一点被转化为反对 dtype 。在将来的版本中，它们会将传递的项转换为索引或系列的时区 (GH37605 ， GH44940 )
不推荐使用 prefix 中的关键字参数 read_csv() 和 read_table() ，在将来的版本中将删除该参数 (GH43396 )
Deprecated passing non boolean argument to sort in concat() (GH41518)
Deprecated passing arguments as positional for read_fwf() other than filepath_or_buffer (GH41485)
Deprecated passing arguments as positional for read_xml() other than path_or_buffer (GH45133)
不推荐使用的传递 skipna=None 为 DataFrame.mad() 和 Series.mad() ，通过 skipna=True 取而代之的是 (GH44580 )
已弃用的行为 to_datetime() 带有字符串“NOW”和 utc=False ；在将来的版本中，这将与 Timestamp("now") ，这反过来又匹配 Timestamp.now() 返回当地时间 (GH18705 )
已弃用 DateOffset.apply() ，使用 offset + other 取而代之的是 (GH44522 )
Deprecated parameter names in Index.copy() (GH44916)
现在，将显示以下内容的弃用警告 DataFrame.to_latex() 指示参数签名可能会更改并模拟更多参数 Styler.to_latex() 在未来的版本中 (GH44411 )
不推荐使用的行为 concat() 在具有bool数据类型和数字数据类型的对象之间进行转换；在未来的版本中，这些类型将转换为对象数据类型，而不是强制将bool转换为数值 (GH39817 )
已弃用 Categorical.replace() ，使用 Series.replace() 取而代之的是 (GH44929 )
Deprecated passing set or dict as indexer for DataFrame.loc.__setitem__(), DataFrame.loc.__getitem__(), Series.loc.__setitem__(), Series.loc.__getitem__(), DataFrame.__getitem__(), Series.__getitem__() and Series.__setitem__() (GH42825)
已弃用 Index.__getitem__() 用布尔键；使用 index.values[key] 为了恢复旧有的行为 (GH44051 )
中已弃用逐列向下转换 DataFrame.where() 使用整型数据类型 (GH44597 )
已弃用 DatetimeIndex.union_many() ，使用 DatetimeIndex.union() 取而代之的是 (GH44091 )
Deprecated Groupby.pad() in favor of Groupby.ffill() (GH33396)
Deprecated Groupby.backfill() in favor of Groupby.bfill() (GH33396)
Deprecated Resample.pad() in favor of Resample.ffill() (GH33396)
Deprecated Resample.backfill() in favor of Resample.bfill() (GH33396)
已弃用 numeric_only=None 在……里面 DataFrame.rank() ；在未来的版本中 numeric_only 必须是其中之一 True 或 False (默认设置) (GH45036 )
Deprecated the behavior of Timestamp.utcfromtimestamp(), in the future it will return a timezone-aware UTC Timestamp (GH22451)
Deprecated NaT.freq() (GH45071)
不推荐使用的行为 Series 和 DataFrame 在传递包含以下内容的浮点型数据时构造 NaN 和忽略dtype参数的整型dtype；在将来的版本中，这将引发 (GH40110 )
不赞成……的行为 Series.to_frame() 和 Index.to_frame() 若要忽略 name 在以下情况下的参数 name=None 。目前，这意味着保留现有名称，但在将来显式传递 name=None 将设置 None 作为结果DataFrame中的列名 (GH44212 )

性能改进#

性能提升 GroupBy.sample() ，尤其是当 weights 提供的参数 (GH34483 )
将非字符串数组转换为字符串数组时的性能改进 (GH34483 )
性能提升 GroupBy.transform() 对于用户定义的函数 (GH41598 )
建筑施工中的绩效改进 DataFrame 对象 (GH42631 ， GH43142 ， GH43147 ， GH43307 ， GH43144 ， GH44826 )
性能提升 GroupBy.shift() 什么时候 fill_value 提供了参数 (GH26615 )
性能提升 DataFrame.corr() 为 method=pearson 在不缺失值的数据上 (GH40956 )
在某些方面的性能改进 GroupBy.apply() 运营 (GH42992 ， GH43578 )
Performance improvement in read_stata() (GH43059, GH43227)
Performance improvement in read_sas() (GH43333)
性能提升 to_datetime() 使用 uint 数据类型 (GH42606 )
Performance improvement in to_datetime() with infer_datetime_format set to True (GH43901)
Performance improvement in Series.sparse.to_coo() (GH42880)
Performance improvement in indexing with a UInt64Index (GH43862)
Performance improvement in indexing with a Float64Index (GH43705)
Performance improvement in indexing with a non-unique Index (GH43792)
Performance improvement in indexing with a listlike indexer on a MultiIndex (GH43370)
Performance improvement in indexing with a MultiIndex indexer on another MultiIndex (GH43370)
Performance improvement in GroupBy.quantile() (GH43469, GH43725)
Performance improvement in GroupBy.count() (GH43730, GH43694)
Performance improvement in GroupBy.any() and GroupBy.all() (GH43675, GH42841)
Performance improvement in GroupBy.std() (GH43115, GH43576)
Performance improvement in GroupBy.cumsum() (GH43309)
SparseArray.min() 和 SparseArray.max() 不再需要转换为密集阵列 (GH43526 )
将索引编入 SparseArray 使用一个 slice 使用 step=1 不再需要转换为密集阵列 (GH43777 )
Performance improvement in SparseArray.take() with allow_fill=False (GH43654)
Performance improvement in Rolling.mean(), Expanding.mean(), Rolling.sum(), Expanding.sum(), Rolling.max(), Expanding.max(), Rolling.min() and Expanding.min() with engine="numba" (GH43612, GH44176, GH45170)
改进的性能 pandas.read_csv() 使用 memory_map=True 当文件编码为UTF-8时 (GH43787 )
Performance improvement in RangeIndex.sort_values() overriding Index.sort_values() (GH43666)
Performance improvement in RangeIndex.insert() (GH43988)
Performance improvement in Index.insert() (GH43953)
Performance improvement in DatetimeIndex.tolist() (GH43823)
Performance improvement in DatetimeIndex.union() (GH42353)
Performance improvement in Series.nsmallest() (GH43696)
Performance improvement in DataFrame.insert() (GH42998)
Performance improvement in DataFrame.dropna() (GH43683)
Performance improvement in DataFrame.fillna() (GH43316)
Performance improvement in DataFrame.values() (GH43160)
Performance improvement in DataFrame.select_dtypes() (GH42611)
性能提升 DataFrame 减量 (GH43185 ， GH43243 ， GH43311 ， GH43609 )
Performance improvement in Series.unstack() and DataFrame.unstack() (GH43335, GH43352, GH42704, GH43025)
Performance improvement in Series.to_frame() (GH43558)
Performance improvement in Series.mad() (GH43010)
Performance improvement in merge() (GH43332)
性能提升 to_csv() 当索引列是日期时间并设置了格式时 (GH39413 )
性能提升 to_csv() 什么时候 MultiIndex 包含许多未使用的关卡 (GH37484 )
性能提升 read_csv() 什么时候 index_col 是用数字列设置的 (GH44158 )
Performance improvement in concat() (GH43354)
Performance improvement in SparseArray.__getitem__() (GH23122)
在构建 DataFrame 从类似数组的对象(如 Pytorch 张量 (GH44616 )

错误修复#

直截了当的#

Bug in setting dtype-incompatible values into a Categorical (or Series or DataFrame backed by Categorical) raising ValueError instead of TypeError (GH41919)
Bug in Categorical.searchsorted() when passing a dtype-incompatible value raising KeyError instead of TypeError (GH41919)
Bug in Categorical.astype() casting datetimes and Timestamp to int for dtype object (GH44930)
Bug in Series.where() with CategoricalDtype when passing a dtype-incompatible value raising ValueError instead of TypeError (GH41919)
Bug in Categorical.fillna() when passing a dtype-incompatible value raising ValueError instead of TypeError (GH41919)
窃听 Categorical.fillna() 使用类似元组的类别引发 ValueError 而不是 TypeError 使用非类别元组填充时 (GH41919 )

类似日期的#

窃听 DataFrame 构造函数不必要地复制非日期时间类2D对象数组 (GH39272 )
Bug in to_datetime() with format and pandas.NA was raising ValueError (GH42957)
to_datetime() 会默默地交换 MM/DD/YYYY 和 DD/MM/YYYY 如果给定的 dayfirst 选项不能被尊重-现在，在分隔日期字符串的情况下发出警告(例如 31-12-2012 ) (GH12585 )
窃听 date_range() 和 bdate_range() 在以下情况下不返回右边界 start = end 并且布景的一侧是封闭的 (GH43394 )
Bug in inplace addition and subtraction of DatetimeIndex or TimedeltaIndex with DatetimeArray or TimedeltaArray (GH43904)
Bug in calling np.isnan, np.isfinite, or np.isinf on a timezone-aware DatetimeIndex incorrectly raising TypeError (GH43917)
在构造一个 Series 从具有混合时区的类似DateTime的字符串错误地部分推断DateTime值 (GH40111 )
Bug in addition of a Tick object and a np.timedelta64 object incorrectly raising instead of returning Timedelta (GH44474)
np.maximum.reduce 和 np.minimum.reduce 现在正确返回 Timestamp 和 Timedelta 操作时的对象 Series ， DataFrame ，或 Index 使用 datetime64[ns] 或 timedelta64[ns] 数据类型 (GH43923 )
添加一个错误 np.timedelta64 对象添加到 BusinessDay 或 CustomBusinessDay 对象不正确地引发 (GH44532 )
窃听 Index.insert() 用于插入 np.datetime64 ， np.timedelta64 或 tuple 变成 Index 使用 dtype='object' 使用负锁定添加 None 和替换现有价值 (GH44509 )
窃听 Timestamp.to_pydatetime() 未能保留 fold 属性 (GH45087 )
窃听 Series.mode() 使用 DatetimeTZDtype 错误地返回时区-naive和 PeriodDtype 错误地提高 (GH41927 )
修复了中的回归问题 reindex() 将不兼容的填充值与类似DATETIME的数据类型一起使用时引发错误(或未因使用 datetime.date 作为填充值) (GH42921 )
窃听 DateOffset 添加到 Timestamp 哪里 offset.nanoseconds 不会包括在结果中 (GH43968 ， GH36589 )
窃听 Timestamp.fromtimestamp() 不支持 tz 论据 (GH45083 )
窃听 DataFrame 从《词典》开始解释 Series 不匹配的索引数据类型有时会根据传递的dict的顺序引发 (GH44091 )
窃听 Timestamp 某些DST转换期间的散列导致分段错误 (GH33931 和 GH40817 )

Timedelta#

所有分区中的错误-NaT TimeDeltaIndex ， Series 或 DataFrame 具有类似于对象的数字数组的列无法将结果推断为timedelta64-dtype (GH39750 )
的楼层划分中存在错误 timedelta64[ns] 具有返回垃圾值的标量的数据 (GH44466 )
窃听 Timedelta 现在适当地考虑到任何kwarg的任何纳秒贡献 (GH43764 ， GH45227 )

时区#

窃听 to_datetime() 使用 infer_datetime_format=True 无法解析零UTC偏移量 (Z )正确 (GH41047 )
Bug in Series.dt.tz_convert() resetting index in a Series with CategoricalIndex (GH43080)
窃听 Timestamp 和 DatetimeIndex 错误地引发 TypeError 当减去两个时区不匹配的时区感知对象时 (GH31793 )

数字#

BUG-将整数列表或元组除以 Series 错误地提高 (GH44674 )
Bug in DataFrame.rank() raising ValueError with object columns and method="first" (GH41931)
窃听 DataFrame.rank() 将缺失值和极值视为相等(例如 np.nan 和 np.inf )，在以下情况下导致错误结果 na_option="bottom" 或 na_option="top 使用 (GH41931 )
Bug in numexpr engine still being used when the option compute.use_numexpr is set to False (GH32556)
窃听 DataFrame 使用其子类进行算术运算 _constructor() 属性是子类本身以外的可调用属性 (GH43201 )
Bug in arithmetic operations involving RangeIndex where the result would have the incorrect name (GH43962)
算术运算中的错误，涉及 Series 其中结果可能会有不正确的 name 当操作数具有匹配的NA或匹配的元组名称时 (GH44459 )
部门中的BUG与 IntegerDtype 或 BooleanDtype 数组和NA标量的提升不正确 (GH44685 )
在乘以 Series 使用 FloatingDtype 使用类似时间增量的标量错误地提升 (GH44772 )

转换#

窃听 UInt64Index 当传递包含小到可以强制转换为int64的正整数和太大而无法在int64中容纳的整数的列表时， (GH42201 )
Bug in Series constructor returning 0 for missing values with dtype int64 and False for dtype bool (GH43017, GH43018)
Bug in constructing a DataFrame from a PandasArray containing Series objects behaving differently than an equivalent np.ndarray (GH43986)
窃听 IntegerDtype 不允许从字符串数据类型强制 (GH25472 )
Bug in to_datetime() with arg:xr.DataArray and unit="ns" specified raises TypeError (GH44053)
Bug in DataFrame.convert_dtypes() not returning the correct type when a subclass does not overload _constructor_sliced() (GH43201)
Bug in DataFrame.astype() not propagating attrs from the original DataFrame (GH44414)
Bug in DataFrame.convert_dtypes() result losing columns.names (GH41435)
在构造一个 IntegerArray 来自无法验证数据类型的yarrow数据 (GH44891 )
窃听 Series.astype() 不允许从 PeriodDtype 至 datetime64 数据类型，与 PeriodIndex 行为 (GH45038 )

字符串#

检查时出现错误 string[pyarrow] 数据类型错误地引发 ImportError 未安装Py Arrow时 (GH44276 )

间隔#

窃听 Series.where() 使用 IntervalDtype 时错误地引发 where 调用不应替换任何内容 (GH44181 )

标引#

窃听 Series.rename() 使用 MultiIndex 和 level 提供了 (GH43659 )
窃听 DataFrame.truncate() 和 Series.truncate() 当对象的 Index 具有大于一个但只有一个唯一值的长度 (GH42365 )
窃听 Series.loc() 和 DataFrame.loc() 使用一个 MultiIndex 使用其中一个级别也是元组的元组进行索引时 (GH27591 )
窃听 Series.loc() 使用一个 MultiIndex 其第一级仅包含 np.nan 值 (GH42055 )
在上编制索引时出错 Series 或 DataFrame 使用一个 DatetimeIndex 传递字符串时，返回类型取决于索引是否单调 (GH24892 )
在上编制索引时出错 MultiIndex 当索引器是包含类似日期时间的字符串的元组时，无法删除标量级别 (GH42476 )
Bug in DataFrame.sort_values() and Series.sort_values() when passing an ascending value, failed to raise or incorrectly raising ValueError (GH41634)
Bug in updating values of pandas.Series using boolean index, created by using pandas.DataFrame.pop() (GH42530)
Bug in Index.get_indexer_non_unique() when index contains multiple np.nan (GH35392)
Bug in DataFrame.query() did not handle the degree sign in a backticked column name, such as `Temp(°C)`, used in an expression to query a DataFrame (GH42826)
Bug in DataFrame.drop() where the error message did not show missing labels with commas when raising KeyError (GH42881)
窃听 DataFrame.query() 时，查询字符串中的方法调用会导致错误 numexpr 已安装程序包 (GH22435 )
Bug in DataFrame.nlargest() and Series.nlargest() where sorted result did not count indexes containing np.nan (GH28984)
在非唯一对象数据类型上建立索引时出错 Index 与安娜标量(例如 np.nan ) (GH43711 )
窃听 DataFrame.__setitem__() 当新数据类型和旧数据类型匹配时，错误地写入现有列的数组，而不是设置新数组 (GH43406 )
将浮点数据类型值设置为 Series 当这些值可以无损地转换为整数时，整数数据类型无法就地设置 (GH44316 )
窃听 Series.__setitem__() 在设置具有匹配大小和dtype=‘DateTime64的数组时使用对象dtype [ns] ‘or dtype=’时间增量64 [ns] ‘将日期时间/时间增量错误地转换为整数 (GH43868 )
窃听 DataFrame.sort_index() 哪里 ignore_index=True 在索引已排序时不受尊重 (GH43591 )
Bug in Index.get_indexer_non_unique() when index contains multiple np.datetime64("NaT") and np.timedelta64("NaT") (GH43869)
设置标量时出现错误 Interval 值转换为 Series 使用 IntervalDtype 当标量的边是浮点数，而值的边是整数时 (GH44201 )
设置字符串备份时出现错误 Categorical 值，这些值可以分析为DateTime DatetimeArray 或 Series 或 DataFrame 列的后盾 DatetimeArray 无法解析这些字符串 (GH44236 )
Bug in Series.__setitem__() with an integer dtype other than int64 setting with a range object unnecessarily upcasting to int64 (GH44261)
窃听 Series.__setitem__() 如果布尔掩码索引器错误地设置了长度为1的类似列表的值来广播该值 (GH44265 )
Bug in Series.reset_index() not ignoring name argument when drop and inplace are set to True (GH44575)
窃听 DataFrame.loc.__setitem__() 和 DataFrame.iloc.__setitem__() 混合数据类型有时无法就地操作 (GH44345 )
窃听 DataFrame.loc.__getitem__() 错误地提高 KeyError 当选择具有布尔键的单个列时 (GH44322 )。
设置中存在错误 DataFrame.iloc() 带着一首单曲 ExtensionDtype 列并设置二维值，例如 df.iloc[:] = df.values 错误地提高 (GH44514 )
使用设置值时出现错误 DataFrame.iloc() 带着一首单曲 ExtensionDtype 列和数组的元组作为索引器 (GH44703 )
对列编制索引时出现错误 loc 或 iloc 使用带有负步距的切片 ExtensionDtype 立柱抬高不正确 (GH44551 )
Bug in DataFrame.loc.__setitem__() changing dtype when indexer was completely False (GH37550)
窃听 IntervalIndex.get_indexer_non_unique() 为非唯一和非单调索引返回布尔掩码而不是整数数组 (GH44084 )
窃听 IntervalIndex.get_indexer_non_unique() 未处理的目标 dtype 正确使用NAN的“Object” (GH44482 )
Fixed regression where a single column np.matrix was no longer coerced to a 1d np.ndarray when added to a DataFrame (GH42376)
窃听 Series.__getitem__() 使用一个 CategoricalIndex 将整数列表视为位置索引器的整数，与使用单个标量整数的行为不一致 (GH15470 ， GH14865 )
窃听 Series.__setitem__() 将浮点数或整数设置为整数-dtype时 Series 在必要时未能向上倾斜以保持精确度 (GH45121 )
窃听 DataFrame.iloc.__setitem__() 忽略轴参数 (GH45032 )

丢失#

Bug in DataFrame.fillna() with limit and no method ignores axis='columns' or axis = 1 (GH40989, GH17399)
窃听 DataFrame.fillna() 在使用类词典时不替换缺少的值 value 和重复的列名 (GH43476 )
在构造一个 DataFrame 带着一本词典 np.datetime64 作为一种价值和 dtype='timedelta64[ns]' ，或者反之亦然，错误地强制转换而不是提升 (GH44428 )
窃听 Series.interpolate() 和 DataFrame.interpolate() 使用 inplace=True 未就地写入基础数组 (GH44749 )
窃听 Index.fillna() 错误地返回未填充的 Index 当NA值存在并且 downcast 参数已指定。这现在引发了 NotImplementedError 取而代之的是；不要通过 downcast 论据 (GH44873 )
窃听 DataFrame.dropna() 正在改变 Index 即使没有条目被删除 (GH41965 )
Bug in Series.fillna() with an object-dtype incorrectly ignoring downcast="infer" (GH44241)

MultiIndex#

窃听 MultiIndex.get_loc() 其中第一个级别是 DatetimeIndex 并传递字符串键 (GH42465 )
窃听 MultiIndex.reindex() 当传递一个 level 它对应于一个 ExtensionDtype 级别 (GH42043 )
窃听 MultiIndex.get_loc() 加薪 TypeError 而不是 KeyError 关于嵌套元组 (GH42440 )
窃听 MultiIndex.union() 设置错误 sortorder 在后续的切片索引操作中导致错误 (GH44752 )
Bug in MultiIndex.putmask() where the other value was also a MultiIndex (GH43212)
窃听 MultiIndex.dtypes() 重复的级别名称每个名称仅返回一个数据类型 (GH45174 )

I/O#

窃听 read_excel() 正在尝试从.xlsx文件中读取图表 (GH41448 )
窃听 json_normalize() 哪里 errors=ignore 可能无法忽略缺少的 meta 什么时候 record_path 的长度大于1 (GH41876 )
窃听 read_csv() 将多标头输入和引用列名的参数作为元组 (GH42446 )
Bug in read_fwf(), where difference in lengths of colspecs and names was not raising ValueError (GH40830)
窃听 Series.to_json() 和 DataFrame.to_json() 其中，将纯Python对象序列化为JSON时跳过了某些属性 (GH42768 ， GH33043 )
列标题在构造 DataFrame 从一家炼油厂的 Row 对象 (GH40682 )
在解除酸洗时出现错误 Index 对象数据类型不正确地推断数字数据类型 (GH43188 )
Bug in read_csv() where reading multi-header input with unequal lengths incorrectly raised IndexError (GH43102)
Bug in read_csv() raising ParserError when reading file in chunks and some chunk blocks have fewer columns than header for engine="c" (GH21211)
Bug in read_csv(), changed exception class when expecting a file path name or file-like object from OSError to TypeError (GH43366)
Bug in read_csv() and read_fwf() ignoring all skiprows except first when nrows is specified for engine='python' (GH44021, GH10261)
窃听 read_csv() 在以下情况下保留对象格式的原始列 keep_date_col=True 已设置 (GH13378 )
窃听 read_json() 未正确处理非数字数据类型(特别是 category ) (GH21892 ， GH33205 )
窃听 json_normalize() 其中，多个字符 sep 参数错误地作为每个键的前缀 (GH43831 )
Bug in json_normalize() where reading data with missing multi-level metadata would not respect errors="ignore" (GH44312)
Bug in read_csv() used second row to guess implicit index if header was set to None for engine="python" (GH22144)
Bug in read_csv() not recognizing bad lines when names were given for engine="c" (GH22144)
窃听 read_csv() 使用 float_precision="round_trip" 它没有跳过首字母/尾随空格 (GH43713 )
在没有使用lzma模块的情况下构建Python时出现错误：即使没有使用lzma功能，在导入Pandas时也会发出警告 (GH43495 )
Bug in read_csv() not applying dtype for index_col (GH9435)
Bug in dumping/loading a DataFrame with yaml.dump(frame) (GH42748)
Bug in read_csv() raising ValueError when names was longer than header but equal to data rows for engine="python" (GH38453)
窃听 ExcelWriter ，在哪里 engine_kwargs 并没有传递到所有的引擎 (GH43442 )
窃听 read_csv() 加薪 ValueError 什么时候 parse_dates 曾与 MultiIndex 列 (GH8991 )
Bug in read_csv() not raising an ValueError when \n was specified as delimiter or sep which conflicts with lineterminator (GH43528)
窃听 to_csv() 将DateTime转换为绝对的 Series 到整数 (GH40754 )
窃听 read_csv() 在日期分析后将列转换为数字失败 (GH11019 )
窃听 read_csv() 不替换 NaN 值具有 np.nan 在尝试日期转换之前 (GH26203 )
窃听 read_csv() 加薪 AttributeError 尝试读取.csv文件并从可为空的整数类型推断索引列数据类型时 (GH44079 )
窃听 to_csv() 始终将不同格式的日期时间列强制为相同格式 (GH21734 )
DataFrame.to_csv() 和 Series.to_csv() 使用 compression 设置为 'zip' 不再创建包含以“.zip”结尾的文件的压缩文件。相反，他们试图更智能地推断内部文件名 (GH39465 )
窃听 read_csv() 其中，将布尔值和缺失值的混合列读取到FLOAT类型会导致缺失值变为1.0而不是NaN (GH42808 ， GH34120 )
窃听 to_xml() 引发以下错误： pd.NA 带扩展数组数据类型 (GH43903 )
窃听 read_csv() 当同时将解析器传递到 date_parser 和 parse_dates=False ，解析仍被称为 (GH44366 )
窃听 read_csv() 未设置的名称 MultiIndex 列在以下情况下正确 index_col 不是第一列 (GH38549 )
窃听 read_csv() 创建内存映射文件失败时，静默忽略错误 (GH44766 )
窃听 read_csv() 当传递一个 tempfile.SpooledTemporaryFile 以二进制模式打开 (GH44748 )
窃听 read_json() 加薪 ValueError 尝试分析包含“：//”的JSON字符串时 (GH36271 )
窃听 read_csv() 什么时候 engine="c" 和 encoding_errors=None 这导致了段错误 (GH45180 )
窃听 read_csv() 的值无效 usecols 导致未关闭的文件句柄 (GH45384 )
窃听 DataFrame.to_json() 修复内存泄漏 (GH43877 )

期间#

Bug in adding a Period object to a np.timedelta64 object incorrectly raising TypeError (GH44182)
Bug in PeriodIndex.to_timestamp() when the index has freq="B" inferring freq="D" for its result instead of freq="B" (GH44105)
Bug in Period constructor incorrectly allowing np.timedelta64("NaT") (GH44507)
窃听 PeriodIndex.to_timestamp() 为具有非连续数据的索引提供不正确的值 (GH44100 )
窃听 Series.where() 使用 PeriodDtype 时错误地引发 where 调用不应替换任何内容 (GH45135 )

标绘#

When given non-numeric data, DataFrame.boxplot() now raises a ValueError rather than a cryptic KeyError or ZeroDivisionError, in line with other plotting functions like DataFrame.hist() (GH43480)

分组/重采样/滚动#

窃听 SeriesGroupBy.apply() 传递无法识别的字符串参数时，未能引发 TypeError 当潜在的 Series 是空的 (GH42021 )
窃听 Series.rolling.apply() ， DataFrame.rolling.apply() ， Series.expanding.apply() 和 DataFrame.expanding.apply() 使用 engine="numba" 哪里 *args 正在使用用户传递的函数进行缓存 (GH42287 )
窃听 GroupBy.max() 和 GroupBy.min() 可为空的整型数据类型失去精度 (GH41743 )
窃听 DataFrame.groupby.rolling.var() 将只计算第一组的滚动差异 (GH42442 )
Bug in GroupBy.shift() that would return the grouping columns if fill_value was not None (GH41556)
窃听 SeriesGroupBy.nlargest() 和 SeriesGroupBy.nsmallest() 将在输入时具有不一致的索引 Series 已排序，并且 n 大于或等于所有组大小 (GH15272 ， GH16345 ， GH29129 )
窃听 pandas.DataFrame.ewm() ，其中非Float64 dtype正在静默地失败 (GH42452 )
Bug in pandas.DataFrame.rolling() operation along rows (axis=1) incorrectly omits columns containing float16 and float32 (GH41779)
窃听 Resampler.aggregate() 不允许使用命名聚合 (GH32803 )
Bug in Series.rolling() when the Series dtype was Int64 (GH43016)
Bug in DataFrame.rolling.corr() when the DataFrame columns was a MultiIndex (GH21157)
窃听 DataFrame.groupby.rolling() 当指定 on 和呼唤 __getitem__ 将随后返回不正确的结果 (GH43355 )
Bug in GroupBy.apply() with time-based Grouper objects incorrectly raising ValueError in corner cases where the grouping vector contains a NaT (GH43500, GH43515)
窃听 GroupBy.mean() 失败的原因是 complex 数据类型 (GH43701 )
窃听 Series.rolling() 和 DataFrame.rolling() 未正确计算第一行的窗口边界 center=True 而且指数在下降 (GH43927 )
窃听 Series.rolling() 和 DataFrame.rolling() 对于居中的类似日期时间的窗口，具有不均匀的纳秒 (GH43997 )
窃听 GroupBy.mean() 加薪 KeyError 当至少选择了两次列时 (GH44924 )
Bug in GroupBy.nth() failing on axis=1 (GH43926)
窃听 Series.rolling() 和 DataFrame.rolling() 如果索引包含重复项，则不尊重居中的类似日期时间的窗口的右边界 (GH3944 )
Bug in Series.rolling() and DataFrame.rolling() when using a pandas.api.indexers.BaseIndexer subclass that returned unequal start and end arrays would segfault instead of raising a ValueError (GH44470)
窃听 Groupby.nunique() 不尊重 observed=True 为 categorical 对列进行分组 (GH45128 )
Bug in GroupBy.head() and GroupBy.tail() not dropping groups with NaN when dropna=True (GH45089)
窃听 GroupBy.__iter__() 中选择列的子集后 GroupBy 对象，该对象返回所有列，而不是所选子集 (GH44821 )
Bug in Groupby.rolling() when non-monotonic data passed, fails to correctly raise ValueError (GH43909)
Bug where grouping by a Series that has a categorical data type and length unequal to the axis of grouping raised ValueError (GH44179)

重塑#

Improved error message when creating a DataFrame column from a multi-dimensional numpy.ndarray (GH42463)
窃听 concat() 正在创建 MultiIndex 连接时具有重复级别条目 DataFrame 中包含重复项 Index 和多个密钥 (GH42651 )
Bug in pandas.cut() on Series with duplicate indices and non-exact pandas.CategoricalIndex() (GH42185, GH42425)
窃听 DataFrame.append() 追加的列不匹配时无法保留数据类型 (GH43392 )
窃听 concat() 的 bool 和 boolean 数据类型导致 object 数据类型而不是 boolean 数据类型 (GH42800 )
窃听 crosstab() 当输入是绝对的时 Series 中的一个或两个中没有出现的类别。 Series ，以及 margins=True 。以前，缺失类别的边际值为 NaN 。现在正确地报告为0 (GH43505 )
窃听 concat() 将会失败，当 objs 参数都有相同的索引，并且 keys 参数包含重复项 (GH43595 )
窃听 concat() 它忽略了 sort 参数 (GH43375 )
窃听 merge() 使用 MultiIndex 对象的列索引 on 在内部分配列时返回错误的参数 (GH43734 )
窃听 crosstab() 当输入是列表或元组时将失败 (GH44076 )
窃听 DataFrame.append() 未能保留 index.name 在追加列表时 Series 对象 (GH44109 )
Fixed metadata propagation in Dataframe.apply() method, consequently fixing the same issue for Dataframe.transform(), Dataframe.nunique() and Dataframe.mode() (GH28283)
窃听 concat() 投射级别 MultiIndex 如果所有级别仅由缺少的值组成，则为浮点型 (GH44900 )
窃听 DataFrame.stack() 使用 ExtensionDtype 立柱抬高不正确 (GH43561 )
窃听 merge() 加薪 KeyError 使用ON关键字连接不同名称的索引时 (GH45094 )
窃听 Series.unstack() 对象对结果列执行不需要的类型推断 (GH44595 )
窃听 MultiIndex.join() 有重叠的 IntervalIndex 级别 (GH44096 )
窃听 DataFrame.replace() 和 Series.replace() 结果是不同的 dtype 基于 regex 参数 (GH44864 )
Bug in DataFrame.pivot() with index=None when the DataFrame index was a MultiIndex (GH23955)

稀疏#

窃听 DataFrame.sparse.to_coo() 加薪 AttributeError 当列名不唯一时 (GH29564 )
窃听 SparseArray.max() 和 SparseArray.min() 加薪 ValueError 对于具有0个非空元素的数组 (GH43527 )
窃听 DataFrame.sparse.to_coo() 将非零填充值静默转换为零 (GH24817 )
窃听 SparseArray 具有不匹配升长的数组操作数的比较方法 AssertionError 或不清楚 ValueError 取决于输入 (GH43863 )
窃听 SparseArray 算术方法 floordiv 和 mod 被零除时的行为与非稀疏不匹配 Series 行为 (GH38172 )
窃听 SparseArray 一元方法以及 SparseArray.isna() 不重新计算索引 (GH44955 )

ExtensionArray#

Bug in array() failing to preserve PandasArray (GH43887)
NumPy ufuncs np.abs, np.positive, np.negative now correctly preserve dtype when called on ExtensionArrays that implement __abs__, __pos__, __neg__, respectively. In particular this is fixed for TimedeltaArray (GH43899, GH23316)
NumPy ufuncs np.minimum.reduce np.maximum.reduce, np.add.reduce, and np.prod.reduce now work correctly instead of raising NotImplementedError on Series with IntegerDtype or FloatDtype (GH43923, GH44793)
NumPy ufuncs with out keyword are now supported by arrays with IntegerDtype and FloatingDtype (GH45122)
避免提高 PerformanceWarning 关于零散的 DataFrame 将多个列与扩展数据类型一起使用时 (GH44098 )
窃听 IntegerArray 和 FloatingArray 构造不正确地强制不匹配的NA值(例如 np.timedelta64("NaT") )到数字NA (GH44514 )
窃听 BooleanArray.__eq__() 和 BooleanArray.__ne__() 加薪 TypeError 与不兼容的类型(如字符串)进行比较。这导致了 DataFrame.replace() 有时会引发 TypeError 如果包含可为空的布尔列 (GH44499 )
窃听 array() 传递给 ndarray 使用 float16 数据类型 (GH44715 )
Bug in calling np.sqrt on BooleanArray returning a malformed FloatingArray (GH44715)
窃听 Series.where() 使用 ExtensionDtype 什么时候 other NA标量是否与 Series 数据类型(例如 NaT 使用数字数据类型)错误地转换为兼容的NA值 (GH44697 )
窃听 Series.replace() 其中显式传递 value=None 被视为没有 value 已通过，并且 None 不在结果中 (GH36984 ， GH19998 )
窃听 Series.replace() 在无操作替换中完成不需要的向下转换 (GH44498 )
窃听 Series.replace() 使用 FloatDtype ， string[python] ，或 string[pyarrow] 在可能的情况下不保留数据类型 (GH33484 ， GH40732 ， GH31644 ， GH41215 ， GH25438 )

造型师#

窃听 Styler 其中 uuid 在初始化时保持浮动下划线 (GH43037 )
窃听 Styler.to_html() 其中 Styler 对象被更新，如果 to_html 使用一些参数调用了方法 (GH43034 )
窃听 Styler.copy() 哪里 uuid 以前没有复制过 (GH40675 )
窃听 Styler.apply() 其中返回的函数 Series 在对齐其索引标签方面，未正确处理对象 (GH13657 ， GH42014 )
Bug when rendering an empty DataFrame with a named Index (GH43305)
Bug when rendering a single level MultiIndex (GH43383)
Bug when combining non-sparse rendering and Styler.hide_columns() or Styler.hide_index() (GH43464)
Bug setting a table style when using multiple selectors in Styler (GH44011)
行修剪和列修剪未能反映隐藏行的错误 (GH43703 ， GH44247 )

其他#

窃听 DataFrame.astype() 具有非唯一列和 Series dtype 论据 (GH44417 )
Bug in CustomBusinessMonthBegin.__add__() (CustomBusinessMonthEnd.__add__()) not applying the extra offset parameter when beginning (end) of the target month is already a business day (GH41356)
Bug in RangeIndex.union() with another RangeIndex with matching (even) step and starts differing by strictly less than step / 2 (GH44019)
窃听 RangeIndex.difference() 使用 sort=None 和 step<0 无法排序 (GH44085 )
窃听 Series.replace() 和 DataFrame.replace() 使用 value=None 和ExtensionDtype (GH44270 ， GH37899 )
窃听 FloatingArray.equals() 如果两个数组包含两个数组，则认为它们相等 np.nan 值 (GH44382 )
窃听 DataFrame.shift() 使用 axis=1 和 ExtensionDtype 不兼容时错误地引发列 fill_value 已通过 (GH44564 )
Bug in DataFrame.shift() with axis=1 and periods larger than len(frame.columns) producing an invalid DataFrame (GH44978)
窃听 DataFrame.diff() 当传递NumPy整数对象而不是 int 对象 (GH44572 )
窃听 Series.replace() 加薪 ValueError 在使用时 regex=True 使用一个 Series 包含 np.nan 值 (GH43344 )
Bug in DataFrame.to_records() where an incorrect n was used when missing names were replaced by level_n (GH44818)
窃听 DataFrame.eval() 哪里 resolvers 参数正在覆盖默认的解析程序 (GH34966 )
Series.__repr__() and DataFrame.__repr__() no longer replace all null-values in indexes with "NaN" but use their real string-representations. "NaN" is used only for float("nan") (GH45263)

贡献者#

共有275人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Abhishek R
Albert Villanova del Moral
Alessandro Bisiani +
Alex Lim
Alex-Gregory-1 +
Alexander Gorodetsky
Alexander Regueiro +
Alexey Györi
Alexis Mignon
Aleš Erjavec
Ali McMaster
Alibi +
Andrei Batomunkuev +
Andrew Eckart +
Andrew Hawyrluk
Andrew Wood
Anton Lodder +
Armin Berres +
Arushi Sharma +
Benedikt Heidrich +
Beni Bienz +
Benoît Vinot
Bert Palm +
Boris Rumyantsev +
Brian Hulette
Brock
Bruno Costa +
Bryan Racic +
Caleb Epstein
Calvin Ho
ChristofKaufmann +
Christopher Yeh +
Chuliang Xiao +
ClaudiaSilver +
DSM
Daniel Coll +
Daniel Schmidt +
Dare Adewumi
David +
David Sanders +
David Wales +
Derzan Chiang +
DeviousLab +
Dhruv B Shetty +
Digres45 +
Dominik Kutra +
Drew Levitt +
DriesS
EdAbati
Elle
Elliot Rampono
Endre Mark Borza
Erfan Nariman
Evgeny Naumov +
Ewout ter Hoeven +
Fangchen Li
Felix Divo
Felix Dulys +
Francesco Andreuzzi +
Francois Dion +
Frans Larsson +
Fred Reiss
GYvan
Gabriel Di Pardi Arruda +
Gesa Stupperich
Giacomo Caria +
Greg Siano +
Griffin Ansel
Hiroaki Ogasawara +
Horace +
Horace Lai +
Irv Lustig
Isaac Virshup
JHM Darbyshire (MBP)
JHM Darbyshire (iMac)
JHM Darbyshire +
Jack Liu
Jacob Skwirsk +
Jaime Di Cristina +
James Holcombe +
Janosh Riebesell +
Jarrod Millman
Jason Bian +
Jeff Reback
Jernej Makovsek +
Jim Bradley +
Joel Gibson +
Joeperdefloep +
Johannes Mueller +
John S Bogaardt +
John Zangwill +
Jon Haitz Legarreta Gorroño +
Jon Wiggins +
Jonas Haag +
Joris Van den Bossche
Josh Friedlander
José Duarte +
Julian Fleischer +
Julien de la Bruère-T
Justin McOmie
Kadatatlu Kishore +
Kaiqi Dong
Kashif Khan +
Kavya9986 +
Kendall +
Kevin Sheppard
Kiley Hewitt
Koen Roelofs +
Krishna Chivukula
KrishnaSai2020
Leonardo Freua +
Leonardus Chen
Liang-Chi Hsieh +
Loic Diridollou +
Lorenzo Maffioli +
Luke Manley +
LunarLanding +
Marc Garcia
Marcel Bittar +
Marcel Gerber +
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Marvin +
Mateusz Piotrowski +
Mathias Hauser +
Matt Richards +
Matthew Davis +
Matthew Roeschke
Matthew Zeitlin
Matthias Bussonnier
Matti Picus
Mauro Silberberg +
Maxim Ivanov
Maximilian Carr +
MeeseeksMachine
Michael Sarrazin +
Michael Wang +
Michał Górny +
Mike Phung +
Mike Taves +
Mohamad Hussein Rkein +
NJOKU OKECHUKWU VALENTINE +
Neal McBurnett +
Nick Anderson +
Nikita Sobolev +
Olivier Cavadenti +
PApostol +
Pandas Development Team
Patrick Hoefler
Peter
Peter Tillmann +
Prabha Arivalagan +
Pradyumna Rahul
Prerana Chakraborty
Prithvijit +
Rahul Gaikwad +
Ray Bell
Ricardo Martins +
Richard Shadrach
Robbert-jan 't Hoen +
Robert Voyer +
Robin Raymond +
Rohan Sharma +
Rohan Sirohia +
Roman Yurchak
Ruan Pretorius +
Sam James +
Scott Talbert
Shashwat Sharma +
Sheogorath27 +
Shiv Gupta
Shoham Debnath
Simon Hawkins
Soumya +
Stan West +
Stefanie Molin +
Stefano Alberto Russo +
Stephan Heßelmann
Stephen
Suyash Gupta +
Sven
Swanand01 +
Sylvain Marié +
TLouf
Tania Allard +
Terji Petersen
TheDerivator +
Thomas Dickson
Thomas Kastl +
Thomas Kluyver
Thomas Li
Thomas Smith
Tim Swast
Tim Tran +
Tobias McNulty +
Tobias Pitters
Tomoki Nakagawa +
Tony Hirst +
Torsten Wörtwein
V.I. Wood +
Vaibhav K +
Valentin Oliver Loftsson +
Varun Shrivastava +
Vivek Thazhathattil +
Vyom Pathak
Wenjun Si
William Andrea +
William Bradley +
Wojciech Sadowski +
Yao-Ching Huang +
Yash Gupta +
Yiannis Hadjicharalambous +
Yoshiki Vázquez Baeza
Yuanhao Geng
Yury Mikhaylov
Yvan Gatete +
Yves Delley +
Zach Rait
Zbyszek Królikowski +
Zero +
Zheyuan
Zhiyi Wu +
aiudirog
ali sayyah +
aneesh98 +
aptalca
arw2019 +
attack68
brendandrury +
bubblingoak +
calvinsomething +
claws +
deponovo +
dicristina
el-g-1 +
evensure +
fotino21 +
fshi01 +
gfkang +
github-actions[bot]
i-aki-y
jbrockmendel
jreback
juliandwain +
jxb4892 +
kendall smith +
lmcindewar +
lrepiton
maximilianaccardo +
michal-gh
neelmraman
partev
phofl +
pratyushsharan +
quantumalaviya +
rafael +
realead
rocabrera +
rosagold
saehuihwang +
salomondush +
shubham11941140 +
srinivasan +
stphnlyd
suoniq
trevorkask +
tushushu
tyuyoshi +
usersblock +
vernetya +
vrserpa +
willie3838 +
zeitlinv +
zhangxiaoxing +

1.4.1中的新特性(2022年2月12日)

1.3.5中的新特性(2021年12月12日)

1.4.0中的新特性(2022年1月22日)#

增强#

改进了警告消息#

索引可以容纳任意ExtensionArray#

造型师#

基于PYARROW的新型CSV引擎的多线程CSV读取#

用于滚动和展开窗口的排名函数#

按位置分组索引#

DataFrame.from_dict和DataFrame.to_dict有新的 'tight' 选项#

其他增强功能#

值得注意的错误修复#

日期字符串解析不一致#

忽略具有空列或全NA列的串联中的数据类型#

NULL-在VALUE_COUNTS AND模式下，不再将值强制为NaN值#

READ_CSV中的MANGLE_DUPE_COLS不再重命名与目标名称冲突的唯一列#

UNSTACK和PIVOT_TABLE不再为超过int32限制的结果引发ValueError#

Groupby。应用一致变换检测#

向后不兼容的API更改#

提高了Python的最低版本#

提高了依赖项的最低版本#

其他API更改#

不推荐使用#

不推荐使用Int64Index、UInt64Index和Float64Index#

不推荐使用DataFrame.append和Series.append#

其他不推荐使用的词#

性能改进#

错误修复#

直截了当的#

类似日期的#

Timedelta#

时区#

数字#

转换#

字符串#

间隔#

标引#

丢失#

MultiIndex#

I/O#

期间#

标绘#

分组/重采样/滚动#

重塑#

稀疏#

ExtensionArray#

造型师#

其他#

贡献者#

DataFrame.from_dict和DataFrame.to_dict有新的 `'tight'` 选项#