1.5.0中的新特性(？？)#

这些是Pandas1.5.0中的变化。看见发行说明获取完整的更改日志，包括其他版本的Pandas。

增强#

造型师#

新方法 Styler.to_string() 用于替代的可自定义输出方法 (GH44502 )

添加了渲染功能 border 和 border-{{side}} Excel中的CSS属性 (GH42276 )

添加了一种新方法 Styler.concat() 它允许添加自定义脚注行，以可视化对数据的其他计算，例如总计和计数等。 (GH43875 ， GH46186 )

Styler.highlight_null() 现在接受 color 与其他内置方法保持一致，并弃用 null_color 尽管这仍然保持向后兼容 (GH45907 )

使用控制索引 `group_keys` 在……里面 `DataFrame.resample()`#

这一论点 group_keys 已添加到方法中 DataFrame.resample() 。和以前一样 DataFrame.groupby() 时，此参数控制是否将每个组添加到重新采样中的索引 Resampler.apply() 是使用的。

警告

未指定 group_keys 参数将保留以前的行为，并且如果通过指定 group_keys=False 。在未来的Pandas版本中，没有具体说明 group_keys 将缺省为与 group_keys=False 。

In [1]: df = pd.DataFrame(
   ...:     {'a': range(6)},
   ...:     index=pd.date_range("2021-01-01", periods=6, freq="8H")
   ...: )
   ...: 

In [2]: df.resample("D", group_keys=True).apply(lambda x: x)
Out[2]: 
                                a
2021-01-01 2021-01-01 00:00:00  0
           2021-01-01 08:00:00  1
           2021-01-01 16:00:00  2
2021-01-02 2021-01-02 00:00:00  3
           2021-01-02 08:00:00  4
           2021-01-02 16:00:00  5

[6 rows x 1 columns]

In [3]: df.resample("D", group_keys=False).apply(lambda x: x)
Out[3]: 
                     a
2021-01-01 00:00:00  0
2021-01-01 08:00:00  1
2021-01-01 16:00:00  2
2021-01-02 00:00:00  3
2021-01-02 08:00:00  4
2021-01-02 16:00:00  5

[6 rows x 1 columns]

以前，生成的索引将取决于 apply ，如下例所示。

In [1]: # pandas 1.3
In [2]: df.resample("D").apply(lambda x: x)
Out[2]:
                     a
2021-01-01 00:00:00  0
2021-01-01 08:00:00  1
2021-01-01 16:00:00  2
2021-01-02 00:00:00  3
2021-01-02 08:00:00  4
2021-01-02 16:00:00  5

In [3]: df.resample("D").apply(lambda x: x.reset_index())
Out[3]:
                           index  a
2021-01-01 0 2021-01-01 00:00:00  0
           1 2021-01-01 08:00:00  1
           2 2021-01-01 16:00:00  2
2021-01-02 0 2021-01-02 00:00:00  3
           1 2021-01-02 08:00:00  4
           2 2021-01-02 16:00:00  5

其他增强功能#

MultiIndex.to_frame() 现在支持这一论点 allow_duplicates 如果它丢失或为假，则在重复标签上引发 (GH45245 )
StringArray 现在接受包含NaN-Like的数组Like (None ， np.nan )，用于 values 在其构造函数中除字符串和 pandas.NA 。 (GH40839 )
Improved the rendering of categories in CategoricalIndex (GH45218)
to_numeric() 现在，当向下转换会生成在Float32中无法表示的值时，会保留Float64数组 (GH43693 )
Series.reset_index() and DataFrame.reset_index() now support the argument allow_duplicates (GH44410)
GroupBy.min() 和 GroupBy.max() 现在支持 Numba 使用 engine 关键字 (GH45428 )
read_csv() 现在支持 defaultdict 作为一个 dtype 参数 (GH41574 )
DataFrame.rolling() 和 Series.rolling() 现在支持 step 具有固定长度窗口的参数 (GH15354 )
Implemented a bool-dtype Index, passing a bool-dtype array-like to pd.Index will now retain bool dtype instead of casting to object (GH45061)
Implemented a complex-dtype Index, passing a complex-dtype array-like to pd.Index will now retain complex dtype instead of casting to object (GH45845)
Series 和 DataFrame 使用 IntegerDtype 现在支持按位运算 (GH34463 )
Add milliseconds field support for DateOffset (GH43371)
DataFrame.reset_index() 现在接受 names 重命名索引名称的参数 (GH6878 )
pd.concat() 现在在何时引发 levels 是给的，但 keys 为None (GH46653 )
pd.concat() 现在在何时引发 levels 包含重复值 (GH46653 )
Added numeric_only argument to DataFrame.corr(), DataFrame.corrwith(), and DataFrame.cov() (GH46560)
A errors.PerformanceWarning 现在在使用 string[pyarrow] 使用不调度到 pyarrow.compute 方法： (GH42613 )

值得注意的错误修复#

这些错误修复可能会带来显著的行为变化。

造型师#

修复了中的错误 CSSToExcelConverter 导致 TypeError 在没有边框样式的情况下为 xlsxwriter 发动机 (GH42276 )

使用 `dropna=True` 使用 `groupby` 变形#

转换是其结果与其输入具有相同大小的操作。当结果为 DataFrame 或 Series ，还要求结果的索引与输入的索引匹配。在Pandas1.4中，使用 DataFrameGroupBy.transform() 或 SeriesGroupBy.transform() 在组中具有空值，并且 dropna=True 给出了错误的结果。从下面的例子可以看出，不正确的结果包含不正确的值，或者结果与输入的索引不同。

In [4]: df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]})

旧行为 ：

In [3]: # Value in the last row should be np.nan
        df.groupby('a', dropna=True).transform('sum')
Out[3]:
   b
0  5
1  5
2  5

In [3]: # Should have one additional row with the value np.nan
        df.groupby('a', dropna=True).transform(lambda x: x.sum())
Out[3]:
   b
0  5
1  5

In [3]: # The value in the last row is np.nan interpreted as an integer
        df.groupby('a', dropna=True).transform('ffill')
Out[3]:
                     b
0                    2
1                    3
2 -9223372036854775808

In [3]: # Should have one additional row with the value np.nan
        df.groupby('a', dropna=True).transform(lambda x: x)
Out[3]:
   b
0  2
1  3

新行为 ：

In [5]: df.groupby('a', dropna=True).transform('sum')
Out[5]: 
     b
0  5.0
1  5.0
2  NaN

[3 rows x 1 columns]

In [6]: df.groupby('a', dropna=True).transform(lambda x: x.sum())
Out[6]: 
     b
0  5.0
1  5.0
2  NaN

[3 rows x 1 columns]

In [7]: df.groupby('a', dropna=True).transform('ffill')
Out[7]: 
     b
0  2.0
1  3.0
2  NaN

[3 rows x 1 columns]

In [8]: df.groupby('a', dropna=True).transform(lambda x: x)
Out[8]: 
     b
0  2.0
1  3.0
2  NaN

[3 rows x 1 columns]

造型师#

修复将“None”显示为yLabel的问题 Series.plot() 当不设置yLabel时 (GH46129 )

notable_bug_fix2#

向后不兼容的API更改#

Read_XML现在支持 `dtype` ， `converters` ，以及 `parse_dates`#

与其他IO方法类似， pandas.read_xml() 现在支持将特定数据类型分配给列、应用转换器方法和分析日期 (GH43567 )。

In [9]: xml_dates = """<?xml version='1.0' encoding='utf-8'?>
   ...: <data>
   ...:   <row>
   ...:     <shape>square</shape>
   ...:     <degrees>00360</degrees>
   ...:     <sides>4.0</sides>
   ...:     <date>2020-01-01</date>
   ...:    </row>
   ...:   <row>
   ...:     <shape>circle</shape>
   ...:     <degrees>00360</degrees>
   ...:     <sides/>
   ...:     <date>2021-01-01</date>
   ...:   </row>
   ...:   <row>
   ...:     <shape>triangle</shape>
   ...:     <degrees>00180</degrees>
   ...:     <sides>3.0</sides>
   ...:     <date>2022-01-01</date>
   ...:   </row>
   ...: </data>"""
   ...: 

In [10]: df = pd.read_xml(
   ....:     xml_dates,
   ....:     dtype={'sides': 'Int64'},
   ....:     converters={'degrees': str},
   ....:     parse_dates=['date']
   ....: )
   ....: 

In [11]: df
Out[11]: 
      shape degrees  sides       date
0    square   00360      4 2020-01-01
1    circle   00360   <NA> 2021-01-01
2  triangle   00180      3 2022-01-01

[3 rows x 4 columns]

In [12]: df.dtypes
Out[12]: 
shape              object
degrees            object
sides               Int64
date       datetime64[ns]
Length: 4, dtype: object

Read_XML现在支持大型XML，使用 `iterparse`#

对于范围在数百兆字节到千兆字节之间的非常大的XML文件， pandas.read_xml() 现在支持使用以下命令解析此类大型文件 lxml's iterparse 和 etree's iterparse 它们是循环访问XML树并提取特定元素和属性而无需将整个树保存在内存中的高效内存方法 (GH#45442 )。

In [1]: df = pd.read_xml(
...      "/path/to/downloaded/enwikisource-latest-pages-articles.xml",
...      iterparse = {"page": ["title", "ns", "id"]})
...  )
df
Out[2]:
                                                     title   ns        id
0                                       Gettysburg Address    0     21450
1                                                Main Page    0     42950
2                            Declaration by United Nations    0      8435
3             Constitution of the United States of America    0      8435
4                     Declaration of Independence (Israel)    0     17858
...                                                    ...  ...       ...
3578760               Page:Black cat 1897 07 v2 n10.pdf/17  104    219649
3578761               Page:Black cat 1897 07 v2 n10.pdf/43  104    219649
3578762               Page:Black cat 1897 07 v2 n10.pdf/44  104    219649
3578763      The History of Tom Jones, a Foundling/Book IX    0  12084291
3578764  Page:Shakespeare of Stratford (1926) Yale.djvu/91  104     21450

[3578765 rows x 3 columns]

api_breaking_change2#

提高了依赖项的最低版本#

更新了一些受支持的依赖项的最低版本。如果已安装，我们现在需要：

套餐	最低版本	必填项	变化
Mypy(开发人员)	0.941		X

为 optional libraries 一般建议使用最新版本。下表列出了目前在整个Pandas发育过程中正在测试的每个库的最低版本。低于最低测试版本的可选库仍可运行，但不被视为受支持。

套餐	最低版本	变化
		X

看见依赖项和可选依赖项想要更多。

其他API更改#

BigQuery I/O方法 read_gbq() 和 DataFrame.to_gbq() 默认为 auth_local_webserver = True 。谷歌已经弃用了 auth_local_webserver = False "out of band" (copy-paste) flow 。这个 auth_local_webserver = False Option计划于2022年10月停止工作。 (GH46312 )

不推荐使用#

在未来的版本中，对 Series 使用一个 Int64Index 或 RangeIndex 将被视为 label-based ，而不是位置。这将使行为与其他 Series.__getitem__() 和 Series.__setitem__() 行为 (GH45162 )。

例如：

In [13]: ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11])

在过去的行为中， ser[2:4] 将切片视为位置：

旧行为 ：

In [3]: ser[2:4]
Out[3]:
5    3
7    4
dtype: int64

在未来的版本中，这将被视为基于标签：

未来行为 ：

In [4]: ser.loc[2:4]
Out[4]:
2    1
3    2
dtype: int64

若要保留旧行为，请使用 series.iloc[i:j] 。要获得将来的行为，请使用 series.loc[i:j] 。

在一块上切片 DataFrame 不会受到影响。

`ExcelWriter` 属性#

的所有属性 ExcelWriter 之前被记录为非公开的。然而，一些第三方Excel引擎记录了访问 ExcelWriter.book 或 ExcelWriter.sheets ，并且用户正在利用这些属性以及可能的其他属性。以前，使用这些属性是不安全的；例如，修改为 ExcelWriter.book 不会更新 ExcelWriter.sheets 反之亦然。为了支持这一点，Pandas已经公开了一些属性，并改进了它们的实现，以便现在可以安全地使用它们。 (GH45572 )

以下属性现在是公共的，并且被认为可以安全访问。

book

check_extension

close

date_format

datetime_format

engine

if_sheet_exists

sheets

supported_extensions

以下属性已弃用。他们现在筹集了一个 FutureWarning 在访问时，并将在未来版本中删除。用户应该知道，他们的使用被认为是不安全的，可能会导致意想不到的结果。

cur_sheet

handles

path

save

write_cells

请参阅的文档 ExcelWriter 了解更多详细信息。

使用 `group_keys` 将变压器放入 `GroupBy.apply()`#

在早期版本的Pandas中，如果推断该函数传递给 GroupBy.apply() 是一个转换器(即结果索引等于输入索引)，则 group_keys 论证 DataFrame.groupby() 和 Series.groupby() 被忽略，并且永远不会将组键添加到结果的索引中。以后，当用户指定时，组密钥将被添加到索引中 group_keys=True 。

作为 group_keys=True 是缺省值 DataFrame.groupby() 和 Series.groupby() ，未指定 group_keys 使用变压器将会引发 FutureWarning 。通过指定以下内容，可以将其静默并保留以前的行为 group_keys=False 。

其他不推荐使用的词#

不推荐使用关键字 line_terminator 在……里面 DataFrame.to_csv() 和 Series.to_csv() ，使用 lineterminator 相反，这是为了与 read_csv() 和标准库‘CSV’模块 (GH9568 )
Deprecated behavior of SparseArray.astype(), Series.astype(), and DataFrame.astype() with SparseDtype when passing a non-sparse dtype. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a SparseDtype (GH34457)
Deprecated behavior of DatetimeIndex.intersection() and DatetimeIndex.symmetric_difference() (union behavior was already deprecated in version 1.3.0) with mixed time zones; in a future version both will be cast to UTC instead of object dtype (GH39328, GH45357)
Deprecated DataFrame.iteritems(), Series.iteritems(), HDFStore.iteritems() in favor of DataFrame.items(), Series.items(), HDFStore.items() (GH45321)
Deprecated Series.is_monotonic() and Index.is_monotonic() in favor of Series.is_monotonic_increasing() and Index.is_monotonic_increasing() (GH45422, GH21335)
不推荐使用的行为 DatetimeIndex.astype() ， TimedeltaIndex.astype() ， PeriodIndex.astype() 转换为非整数数据类型时 int64 。在将来的版本中，它们将完全转换为指定的数据类型(而不是始终 int64 )，并在转换溢出时引发 (GH45034 )
不推荐使用 __array_wrap__ DataFrame和Series的方法，转而依赖标准的NumPy uuncs (GH45451 )
Deprecated treating float-dtype data as wall-times when passed with a timezone to Series or DatetimeIndex (GH45573)
已弃用的行为 Series.fillna() 和 DataFrame.fillna() 使用 timedelta64[ns] 数据类型和不兼容的填充值；在将来的版本中，这将强制转换为公共数据类型(通常是对象)，而不是引发，从而与其他数据类型的行为相匹配 (GH45746 )
Deprecated the warn parameter in infer_freq() (GH45947)
Deprecated allowing non-keyword arguments in ExtensionArray.argsort() (GH46134)
已弃用的处理全布尔 object -dtype列，类似于中的bool DataFrame.any() 和 DataFrame.all() 使用 bool_only=True ，显式强制转换为bool (GH46188 )
方法的弃用行为 DataFrame.quantile() 、属性 numeric_only 将默认为False。在结果中包括DATETIME/TIME增量列 (GH7308 )。
Deprecated Timedelta.freq and Timedelta.is_populated (GH46430)
Deprecated Timedelta.delta (GH46476)
Deprecated passing arguments as positional in DataFrame.any() and Series.any() (GH44802)
不推荐使用 closed 中的参数 interval_range() 赞成 inclusive 参数；在将来的版本中传递 closed 将筹集 (GH40245 )
不推荐使用这些方法 DataFrame.mad() ， Series.mad() ，以及相应的GroupBy方法 (GH11787 )

性能改进#

Performance improvement in DataFrame.corrwith() for column-wise (axis=0) Pearson and Spearman correlation when other is a Series (GH46174)
性能提升 GroupBy.transform() 对于某些用户定义的DataFrame->系列函数 (GH45387 )
性能提升 DataFrame.duplicated() 当子集仅包含一列时 (GH45236 )
Performance improvement in GroupBy.diff() (GH16706)
性能提升 GroupBy.transform() 广播用户定义函数的值时 (GH45708 )
性能提升 GroupBy.transform() 仅存在单个组时的用户定义函数 (GH44977 )
Performance improvement in DataFrame.loc() and Series.loc() for tuple-based indexing of a MultiIndex (GH45681, GH46040, GH46330)
性能提升 MultiIndex.values 当多索引包含类型为DatetimeIndex、TimedeltaIndex或ExtensionDtype的级别时 (GH46288 )
性能提升 merge() 当左侧和/或右侧为空时 (GH45838 )
性能提升 DataFrame.join() 当左侧和/或右侧为空时 (GH46015 )
Performance improvement in DataFrame.reindex() and Series.reindex() when target is a MultiIndex (GH46235)
在派生的字符串数组中设置值时的性能改进 (GH46400 )
Performance improvement in factorize() (GH46109)
性能提升 DataFrame 和 Series 扩展数据类型标量的构造函数 (GH45854 )

错误修复#

直截了当的#

窃听 Categorical.view() 不接受整型数据类型 (GH25464 )
Bug in CategoricalIndex.union() when the index's categories are integer-dtype and the index contains NaN values incorrectly raising instead of casting to float64 (GH45362)

类似日期的#

窃听 DataFrame.quantile() 具有类似DateTime的数据类型，并且没有错误返回的行 float64 数据类型，而不是保留类似日期时间的数据类型 (GH41544 )
窃听 to_datetime() 具有以下序列 np.str_ 错误地引发对象 (GH32264 )
窃听 Timestamp 在将DateTime组件作为位置参数传递时构造 tzinfo 作为关键字参数错误地引发 (GH31929 )
窃听 Index.astype() 将对象数据类型强制转换为 timedelta64[ns] 数据类型转换不正确 np.datetime64("NaT") 值为 np.timedelta64("NaT") 与其提高 (GH45722 )
窃听 SeriesGroupBy.value_counts() 传递分类列时的索引 (GH44324 )
窃听 DatetimeIndex.tz_localize() 本地化到UTC无法制作底层数据的副本 (GH46460 )

Timedelta#

窃听 astype_nansafe() Astype(“timedelta64 [ns] “)在包含np.nan时失败 (GH45798 )
Bug in constructing a Timedelta with a np.timedelta64 object and a unit sometimes silently overflowing and returning incorrect results instead of raising OutOfBoundsTimedelta (GH46827)

时区#

窃听 Timestamp 当传递给 ZoneInfo TzInfo对象 (GH46425 )

数字#

在类似数组的操作中出现错误 dtype="boolean" 和 NA 不正确地就地更改阵列 (GH45421 )
组织里的窃听器， pow 和 mod 在类似数组的操作上使用 dtype="boolean" 不像他们的 np.bool_ 对应方 (GH46063 )
在乘以 Series 使用 IntegerDtype 或 FloatingDtype 通过一个类似于数组的 timedelta64[ns] 数据类型错误引发 (GH45622 )

转换#

窃听 DataFrame.astype() 不保留子类 (GH40810 )
Bug in constructing a Series from a float-containing list or a floating-dtype ndarray-like (e.g. dask.Array) and an integer dtype raising instead of casting like we would with an np.ndarray (GH40110)
窃听 Float64Index.astype() 转换为无符号整型数据类型错误地转换为 np.int64 数据类型 (GH45309 )
窃听 Series.astype() 和 DataFrame.astype() 从浮点型数据类型到无符号整型数据类型，出现负值时无法引发 (GH45151 )
窃听 array() 使用 FloatingDtype 和包含浮点可转换字符串的值不正确地引发 (GH45424 )
比较字符串和日期时间64 ns对象时出现错误，导致 OverflowError 例外情况。 (GH45506 )
Bug in metaclass of generic abstract dtypes causing DataFrame.apply() and Series.apply() to raise for the built-in function type (GH46684)
窃听 DataFrame.to_dict() 为 orient="list" 或 orient="index" 不是返回本机类型 (GH46751 )

字符串#

Bug in str.startswith() and str.endswith() when using other series as parameter _pat_. Now raises TypeError (GH3485)

间隔#

Bug in IntervalArray.__setitem__() when setting np.nan into an integer-backed array raising ValueError instead of TypeError (GH45484)

标引#

Bug in loc.__getitem__() with a list of keys causing an internal inconsistency that could lead to a disconnect between frame.at[x, y] vs frame[y].loc[x] (GH22372)
窃听 DataFrame.iloc() 对象上的单行索引 DataFrame 使用单个ExtensionDtype列提供了底层数据的副本而不是视图 (GH45241 )
窃听 Series.align() 不会创建 MultiIndex 当两个多索引交点相同时使用并集层 (GH45224 )
Bug in setting a NA value (None or np.nan) into a Series with int-based IntervalDtype incorrectly casting to object dtype instead of a float-based IntervalDtype (GH45568)
将设置值索引到 ExtensionDtype 列中包含 df.iloc[:, i] = values 使用 values 具有与相同的数据类型 df.iloc[:, i] 错误地插入新数组而不是就地设置 (GH33457 )
窃听 Series.__setitem__() 使用非整数 Index 使用整型键设置不能就地设置的值时， ValueError 被引发，而不是强制转换为公共dtype。 (GH45070 )
Bug in Series.__setitem__() when setting incompatible values into a PeriodDtype or IntervalDtype Series raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with Series.mask() and Series.where() (GH45768)
窃听 DataFrame.where() 具有类似日期时间的数据类型的多个列无法向下转换与其他数据类型一致的结果 (GH45837 )
Bug in Series.loc.__setitem__() and Series.loc.__getitem__() not raising when using multiple keys without using a MultiIndex (GH13831)
窃听 Index.reindex() 加薪 AssertionError 什么时候 level 已指定，但没有 MultiIndex 已指定；现在忽略级别 (GH35132 )
设置的值太大时出现错误 Series 数据类型无法强制转换为通用类型 (GH26049 ， GH32878 )
窃听 loc.__setitem__() 治病 range 位置关键点，而不是基于标签的关键点 (GH45479 )
窃听 Series.__setitem__() 在设置时 boolean 数据类型值包含 NA 不正确地提升而不是强制转换为 boolean 数据类型 (GH45462 )
Bug in Series.__setitem__() where setting NA into a numeric-dtpye Series would incorrectly upcast to object-dtype rather than treating the value as np.nan (GH44199)
窃听 Series.__setitem__() 使用 datetime64[ns] Dtype、全`假``布尔掩码和不兼容的值错误地转换为 object 与其留住 datetime64[ns] 数据类型 (GH45967 )
Bug in Index.__getitem__() raising ValueError when indexer is from boolean dtype with NA (GH45806)
窃听 Series.mask() 使用 inplace=True 或使用带有小整数数据类型的布尔掩码设置值时不正确地引发 (GH45750 )
窃听 DataFrame.mask() 使用 inplace=True 和 ExtensionDtype 立柱抬高不正确 (GH45577 )
从具有类似DateTime值的Object-dtype行索引的DataFrame中获取列时出错：生成的Series现在保留父DataFrame中的确切Object-dtype Index (GH42950 )
窃听 DataFrame.__getattribute__() 加薪 AttributeError 如果列具有 "string" 数据类型 (GH46185 )
在上编制索引时出错 DatetimeIndex 使用一个 np.str_ 升键不正确 (GH45580 )
窃听 CategoricalIndex.get_indexer() 当索引包含 NaN 值，导致将目标中但不存在于索引中的元素映射到NaN元素的索引，而不是-1 (GH45361 )
将大整数值设置为时出现错误 Series 使用 float32 或 float16 Dtype错误地更改了这些值，而不是强制 float64 数据类型 (GH45844 )
窃听 Series.asof() 和 DataFrame.asof() 错误地将bool-dtype结果转换为 float64 数据类型 (GH16063 )

丢失#

窃听 Series.fillna() 和 DataFrame.fillna() 使用 downcast 在某些不存在NA值的情况下不考虑关键字 (GH45423 )
窃听 Series.fillna() 和 DataFrame.fillna() 使用 IntervalDtype 和不兼容的值提升，而不是强制转换为公共(通常是对象)数据类型 (GH45796 )
Bug in DataFrame.interpolate() with object-dtype column not returning a copy with inplace=False (GH45791)

MultiIndex#

窃听 DataFrame.loc() 切片时返回空结果 MultiIndex 步长为负且开始/停止值非空 (GH46156 )
窃听 DataFrame.loc() 切片时引发 MultiIndex 步长为负值，而不是-1 (GH46156 )
窃听 DataFrame.loc() 切片时引发 MultiIndex 其步长为负值，并对非int标记的索引级别进行切片 (GH46156 )
窃听 Series.to_numpy() 时，多索引系列无法转换为NumPy数组 na_value 已提供 (GH45774 )
窃听 MultiIndex.equals 当只有一侧具有扩展数组数据类型时，不可交换 (GH46026 )
窃听 MultiIndex.from_tuples() 无法构造空元组的索引 (GH45608 )

I/O#

Bug in DataFrame.to_stata() where no error is raised if the DataFrame contains -np.inf (GH45350)
窃听 read_excel() 会导致无限循环，其中包含某些 skiprows 可拆卸的 (GH45585 )
Bug in DataFrame.info() where a new line at the end of the output is omitted when called on an empty DataFrame (GH45494)
Bug in read_csv() not recognizing line break for on_bad_lines="warn" for engine="c" (GH41710)
窃听 DataFrame.to_csv() 不尊重 float_format 为 Float64 数据类型 (GH45991 )
窃听 read_csv() 在所有情况下都不使用指定的转换器来索引列 (GH40589 )
窃听 read_parquet() 什么时候 engine="pyarrow" 当传递不受支持的数据类型的列时，这会导致部分写入磁盘 (GH44914 )
窃听 DataFrame.to_excel() 和 ExcelWriter 在将空DataFrame写入 .ods 文件 (GH45793 )
窃听 read_html() 其中，周围的元素 <br> 在没有空格的情况下连接在一起 (GH29528 )
间隔dtype的拼图往返中存在错误 datetime64[ns] 亚型 (GH45881 )
窃听 read_excel() 当阅读一份 .ods 在XML元素之间包含换行符的文件 (GH45598 )
窃听 read_parquet() 什么时候 engine="fastparquet" 文件因错误而未关闭的位置 (GH46555 )
to_html() 现在不包括 border 属性来自 <table> 元素时 border 关键字设置为 False 。

期间#

减法时出现错误 Period 从… PeriodArray 返回错误的结果 (GH45999 )
窃听 Period.strftime() 和 PeriodIndex.strftime() 、指令 %l 和 %u 给出了错误的结果 (GH46252 )

标绘#

窃听 DataFrame.plot.barh() 这阻止了对x轴和 xlabel 更新y轴标签 (GH45144 )
窃听 DataFrame.plot.box() 这阻止了对x轴的标记 (GH45463 )
Bug in DataFrame.boxplot() that prevented passing in xlabel and ylabel (GH45463)
Bug in DataFrame.boxplot() that prevented specifying vert=False (GH36918)
Bug in DataFrame.plot.scatter() that prevented specifying norm (GH45809)
该函数 DataFrame.plot.scatter() 现在接受 color 作为的别名 c 和 size 作为的别名 s 为了与其他绘图函数保持一致 (GH44670 )

分组/重采样/滚动#

Bug in DataFrame.resample() ignoring closed="right" on TimedeltaIndex (GH45414)
窃听 DataFrameGroupBy.transform() 在以下情况下失败 func="size" 并且输入DataFrame有多个列 (GH27469 )
Bug in DataFrameGroupBy.size() and DataFrameGroupBy.transform() with func="size" produced incorrect results when axis=1 (GH45715)
窃听 ExponentialMovingWindow.mean() 使用 axis=1 和 engine='numba' 当 DataFrame 列数多于行数 (GH46086 )
Bug when using engine="numba" would return the same jitted function when modifying engine_kwargs (GH46086)
Bug in DataFrameGroupby.transform() fails when axis=1 and func is "first" or "last" (GH45986)
窃听 DataFrameGroupby.cumsum() 使用 skipna=False 给出不正确的结果 (GH46216 )
窃听 GroupBy.cumsum() 使用 timedelta64[ns] 数据类型无法识别 NaT 作为空值 (GH46216 )
窃听 GroupBy.cummin() 和 GroupBy.cummax() 可为空的数据类型错误地更改了原始数据 (GH46220 )
窃听 GroupBy.cummax() 使用 int64 前导值为最小int64的数据类型 (GH46382 )
Bug in GroupBy.max() with empty groups and uint64 dtype incorrectly raising RuntimeError (GH46408)
窃听 GroupBy.apply() 在以下情况下会失败 func 是一个字符串，并且提供了args或kwargs (GH46479 )
窃听 SeriesGroupBy.apply() 当存在唯一组时，会错误地命名其结果 (GH46369 )
窃听 Rolling.var() 和 Rolling.std() 将提供具有相同值的窗口的非零结果 (GH42064 )
窃听 Rolling.var() 当窗口大小大于数据大小时，是否会出现计算加权方差的分段错误 (GH46760 )
窃听 Grouper.__repr__() 哪里 dropna 不包括在内。现在是了 (GH46754 )

重塑#

Bug in concat() between a Series with integer dtype and another with CategoricalDtype with integer categories and containing NaN values casting to object dtype instead of float64 (GH45359)
窃听 get_dummies() 所选对象和分类数据类型，但不是字符串 (GH44965 )
Bug in DataFrame.align() when aligning a MultiIndex to a Series with another MultiIndex (GH46001)
与合并中的错误 IntegerDtype ，或 FloatingDtype 数组，其中结果数据类型不镜像不可为空的数据类型的行为 (GH46379 )
Bug in concat() with identical key leads to error when indexing MultiIndex (GH46519)
窃听 DataFrame.join() 使用后缀连接具有重复列名的DataFrame时使用列表 (GH46396 )

稀疏#

Bug in Series.where() and DataFrame.where() with SparseDtype failing to retain the array's fill_value (GH45691)

ExtensionArray#

Bug in IntegerArray.searchsorted() and FloatingArray.searchsorted() returning inconsistent results when acting on np.nan (GH45255)

造型师#

尝试将样式函数应用于空的DataFrame子集时出错 (GH45313 )

元数据#

Fixed metadata propagation in DataFrame.melt() (GH28283)
Fixed metadata propagation in DataFrame.explode() (GH28283)

其他#

贡献者#

发行说明

1.4.3中的新特性(2022年4月？？)

1.5.0中的新特性(？？)#

增强#

造型师#

使用控制索引 group_keys 在……里面 DataFrame.resample()#

其他增强功能#

值得注意的错误修复#

造型师#

使用 dropna=True 使用 groupby 变形#

造型师#

notable_bug_fix2#

向后不兼容的API更改#

Read_XML现在支持 dtype ， converters ，以及 parse_dates#

Read_XML现在支持大型XML，使用 iterparse#

api_breaking_change2#

提高了依赖项的最低版本#

其他API更改#

不推荐使用#

ExcelWriter 属性#

使用 group_keys 将变压器放入 GroupBy.apply()#

其他不推荐使用的词#

性能改进#

错误修复#

直截了当的#

类似日期的#

Timedelta#

时区#

数字#

转换#

字符串#

间隔#

标引#

丢失#

MultiIndex#

I/O#

期间#

标绘#

分组/重采样/滚动#

重塑#

稀疏#

ExtensionArray#

造型师#

元数据#

其他#

贡献者#

使用控制索引 `group_keys` 在……里面 `DataFrame.resample()`#

使用 `dropna=True` 使用 `groupby` 变形#

Read_XML现在支持 `dtype` ， `converters` ，以及 `parse_dates`#

Read_XML现在支持大型XML，使用 `iterparse`#

`ExcelWriter` 属性#

使用 `group_keys` 将变压器放入 `GroupBy.apply()`#