1.3.0中的新特性(2021年7月2日)#

这些是Pandas1.3.0中的变化。看见发行说明获取完整的更改日志，包括其他版本的Pandas。

警告

阅读新的Excel 2007+时 (.xlsx )文件，默认参数 engine=None 至 read_excel() 现在将导致使用 openpyxl 在所有情况下，当选项为 io.excel.xlsx.reader 设置为 "auto" 。以前，某些情况下会使用 xlrd 换成了发动机。看见 What's new 1.2.0 了解这一变化的背景。

增强#

读取CSV或JSON文件时的自定义HTTP标头#

当从不是由fsspec处理的远程URL(例如，HTTP和HTTPS)读取时，词典传递到 storage_options 将用于创建请求中包含的标头。这可用于控制User-Agent标头或发送其他自定义标头 (GH36688 )。例如：

In [1]: headers = {"User-Agent": "pandas"}

In [2]: df = pd.read_csv(
   ...:     "https://download.bls.gov/pub/time.series/cu/cu.item",
   ...:     sep="\t",
   ...:     storage_options=headers
   ...: )
   ...: 
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 df = pd.read_csv(
      2     "https://download.bls.gov/pub/time.series/cu/cu.item",
      3     sep="\t",
      4     storage_options=headers
      5 )

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/util/_decorators.py:317, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    311 if len(args) > num_allow_args:
    312     warnings.warn(
    313         msg.format(arguments=arguments),
    314         FutureWarning,
    315         stacklevel=stacklevel,
    316     )
--> 317 return func(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/parsers/readers.py:927, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    912 kwds_defaults = _refine_defaults_read(
    913     dialect,
    914     delimiter,
   (...)
    923     defaults={"delimiter": ","},
    924 )
    925 kwds.update(kwds_defaults)
--> 927 return _read(filepath_or_buffer, kwds)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/parsers/readers.py:582, in _read(filepath_or_buffer, kwds)
    579 _validate_names(kwds.get("names", None))
    581 # Create the parser.
--> 582 parser = TextFileReader(filepath_or_buffer, **kwds)
    584 if chunksize or iterator:
    585     return parser

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/parsers/readers.py:1421, in TextFileReader.__init__(self, f, engine, **kwds)
   1418     self.options["has_index_names"] = kwds["has_index_names"]
   1420 self.handles: IOHandles | None = None
-> 1421 self._engine = self._make_engine(f, self.engine)

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/parsers/readers.py:1707, in TextFileReader._make_engine(self, f, engine)
   1703     mode = "rb"
   1704 # error: No overload variant of "get_handle" matches argument types
   1705 # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
   1706 # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1707 self.handles = get_handle(  # type: ignore[call-overload]
   1708     f,
   1709     mode,
   1710     encoding=self.options.get("encoding", None),
   1711     compression=self.options.get("compression", None),
   1712     memory_map=self.options.get("memory_map", False),
   1713     is_text=is_text,
   1714     errors=self.options.get("encoding_errors", "strict"),
   1715     storage_options=self.options.get("storage_options", None),
   1716 )
   1717 assert self.handles is not None
   1718 f = self.handles.handle

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/common.py:667, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    664     codecs.lookup_error(errors)
    666 # open URLs
--> 667 ioargs = _get_filepath_or_buffer(
    668     path_or_buf,
    669     encoding=encoding,
    670     compression=compression,
    671     mode=mode,
    672     storage_options=storage_options,
    673 )
    675 handle = ioargs.filepath_or_buffer
    676 handles: list[BaseBuffer]

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/common.py:336, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    334 # assuming storage_options is to be interpreted as headers
    335 req_info = urllib.request.Request(filepath_or_buffer, headers=storage_options)
--> 336 with urlopen(req_info) as req:
    337     content_encoding = req.headers.get("Content-Encoding", None)
    338     if content_encoding == "gzip":
    339         # Override compression based on Content-Encoding header

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/common.py:239, in urlopen(*args, **kwargs)
    233 """
    234 Lazy-import wrapper for stdlib urlopen, as that imports a big chunk of
    235 the stdlib.
    236 """
    237 import urllib.request
--> 239 return urllib.request.urlopen(*args, **kwargs)

File /usr/lib/python3.10/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File /usr/lib/python3.10/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
    523 for processor in self.process_response.get(protocol, []):
    524     meth = getattr(processor, meth_name)
--> 525     response = meth(req, response)
    527 return response

File /usr/lib/python3.10/urllib/request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
    631 # According to RFC 2616, "2xx" code indicates that the client's
    632 # request was successfully received, understood, and accepted.
    633 if not (200 <= code < 300):
--> 634     response = self.parent.error(
    635         'http', request, response, code, msg, hdrs)
    637 return response

File /usr/lib/python3.10/urllib/request.py:563, in OpenerDirector.error(self, proto, *args)
    561 if http_err:
    562     args = (dict, 'default', 'http_error_default') + orig_args
--> 563     return self._call_chain(*args)

File /usr/lib/python3.10/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File /usr/lib/python3.10/urllib/request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 403: Forbidden

读写XML文档#

我们添加了I/O支持以读取和呈现浅层版本 XML 包含的文档 read_xml() 和 DataFrame.to_xml() 。使用 lxml 作为解析器，XPath 1.0和XSLT 1.0都可用。 (GH27554 )

In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
   ...: <data>
   ...:  <row>
   ...:     <shape>square</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides>4.0</sides>
   ...:  </row>
   ...:  <row>
   ...:     <shape>circle</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides/>
   ...:  </row>
   ...:  <row>
   ...:     <shape>triangle</shape>
   ...:     <degrees>180</degrees>
   ...:     <sides>3.0</sides>
   ...:  </row>
   ...:  </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
      shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>

有关详细信息，请参阅编写XML 有关IO工具的用户指南中。

Styler增强功能#

我们提供了一些专注于以下方面的开发 Styler 。另请参阅 Styler documentation 已经修改和完善了 (GH39720 ， GH39317 ， GH40493 )。

The method Styler.set_table_styles() can now accept more natural CSS language for arguments, such as 'color:red;' instead of [('color', 'red')] (GH39563)

这些方法 Styler.highlight_null() ， Styler.highlight_min() ，以及 Styler.highlight_max() 现在允许使用自定义的CSS突出显示，而不是默认的背景颜色 (GH40242 )

Styler.apply() 现在接受返回 ndarray 什么时候 axis=None ，使其现在与 axis=0 和 axis=1 行为 (GH39359 )

当格式不正确的css通过 Styler.apply() 或 Styler.applymap() ，现在渲染时会引发错误 (GH39660 )

Styler.format() 现在接受关键字参数 escape 用于可选的HTML和LaTeX转义 (GH40388 ， GH41619 )

Styler.background_gradient() 已经获得了这样的论点 gmap 为着色提供特定的渐变贴图 (GH22727 )

Styler.clear() 现在可以放行 Styler.hidden_index 和 Styler.hidden_columns 也是 (GH40484 )

Added the method Styler.highlight_between() (GH39821)

Added the method Styler.highlight_quantile() (GH40926)

Added the method Styler.text_gradient() (GH41098)

添加了方法 Styler.set_tooltips() 允许悬停工具提示；这可用于增强交互显示 (GH21266 ， GH40284 )

添加了参数 precision 到该方法 Styler.format() 控制浮点数的显示 (GH40134 )

Styler rendered HTML output now follows the w3 HTML Style Guide (GH39626)

的许多功能 Styler 类现在在具有非唯一索引或列的DataFrame上部分或完全可用 (GH41143 )

One has greater control of the display through separate sparsification of the index or columns using the new styler options, which are also usable via option_context() (GH41142)

添加了选项 styler.render.max_elements 设置大型DataFrame样式时避免浏览器过载 (GH40712 )

Added the method Styler.to_latex() (GH21673, GH42320), which also allows some limited CSS conversion (GH40731)

Added the method Styler.to_html() (GH13379)

添加了方法 Styler.set_sticky() 使索引和列标题在滚动的HTML框架中永久可见 (GH29072 )

DataFrame构造函数支持 `copy=False` 使用DICT#

当将词典传递给 DataFrame 使用 copy=False ，将不再复制副本 (GH32960 )。

In [3]: arr = np.array([1, 2, 3])

In [4]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [5]: df
Out[5]: 
   A  B
0  1  1
1  2  2
2  3  3

[3 rows x 2 columns]

df["A"] 仍然是一种观点 arr ：

In [6]: arr[0] = 0

In [7]: assert df.iloc[0, 0] == 0

未通过时的默认行为 copy 将保持不变，即将创建副本。

PyArrow支持的字符串数据类型#

我们已经增强了 StringDtype ，一种专用于字符串数据的扩展类型。 (GH39908 )

现在可以指定一个 storage 关键字选项至 StringDtype 。使用Pandas选项或使用指定数据类型 dtype='string[pyarrow]' 以允许由PyArrow数组而不是由Python对象的NumPy数组支持String数组。

需要安装PyArrow 1.0.0或更高版本才能安装由PyArrow支持的StringArray。

警告

string[pyarrow] 目前被认为是试验性的。API的实现和部分内容可能会在没有任何警告的情况下发生更改。

In [8]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[8]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

您可以使用别名 "string[pyarrow]" 也是。

In [9]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")

In [10]: s
Out[10]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

您还可以使用Pandas选项创建一个PyArrow支持的字符串数组。

In [11]: with pd.option_context("string_storage", "pyarrow"):
   ....:     s = pd.Series(['abc', None, 'def'], dtype="string")
   ....: 

In [12]: s
Out[12]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

通常的字符串访问器方法可以工作。在适当的情况下，DataFrame的系列或列的返回类型也将具有字符串数据类型。

In [13]: s.str.upper()
Out[13]: 
0     ABC
1    <NA>
2     DEF
Length: 3, dtype: string

In [14]: s.str.split('b', expand=True).dtypes
Out[14]: 
0    string
1    string
Length: 2, dtype: object

返回整数的字符串访问器方法将返回值 Int64Dtype

In [15]: s.str.count("a")
Out[15]: 
0       1
1    <NA>
2       0
Length: 3, dtype: Int64

居中的日期时间样式的滚动窗口#

在使用类似DateTime的索引对DataFrame和Series对象执行滚动计算时，现在可以使用居中的类似DateTime的窗口 (GH38780 )。例如：

In [16]: df = pd.DataFrame(
   ....:     {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
   ....: )
   ....: 

In [17]: df
Out[17]: 
            A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

[5 rows x 1 columns]

In [18]: df.rolling("2D", center=True).mean()
Out[18]: 
              A
2020-01-01  0.5
2020-01-02  1.5
2020-01-03  2.5
2020-01-04  3.5
2020-01-05  4.0

[5 rows x 1 columns]

其他增强功能#

DataFrame.rolling() ， Series.rolling() ， DataFrame.expanding() ，以及 Series.expanding() 现在支持 method 参数的参数 'table' 选项，该选项对整个 DataFrame 。看见 Window Overview 以获得性能和功能优势 (GH15095 ， GH38995 )
ExponentialMovingWindow now support a online method that can perform mean calculations in an online fashion. See Window Overview (GH41673)
Added MultiIndex.dtypes() (GH37062)
Added end and end_day options for the origin argument in DataFrame.resample() (GH37804)
Improved error message when usecols and names do not match for read_csv() and engine="c" (GH29042)
Improved consistency of error messages when passing an invalid win_type argument in Window methods (GH15969)
read_sql_query() 现在接受 dtype 参数根据用户输入从SQL数据库转换列数据 (GH10285 )
read_csv() 现在正在提高 ParserWarning 如果标题或给定名称的长度与以下情况下的数据长度不匹配 usecols 未指定 (GH21768 )
Improved integer type mapping from pandas to SQLAlchemy when using DataFrame.to_sql() (GH35076)
to_numeric() 现在支持向下转换可为空的 ExtensionDtype 对象 (GH33013 )
Added support for dict-like names in MultiIndex.set_names and MultiIndex.rename (GH20421)
read_excel() 现在可以自动检测.xlsb文件和较旧的.xls文件 (GH35416 ， GH41225 )
ExcelWriter 现在接受 if_sheet_exists 参数来控制写入现有工作表时追加模式的行为 (GH40230 )
Rolling.sum() ， Expanding.sum() ， Rolling.mean() ， Expanding.mean() ， ExponentialMovingWindow.mean() ， Rolling.median() ， Expanding.median() ， Rolling.max() ， Expanding.max() ， Rolling.min() ，以及 Expanding.min() 现在支持 Numba 使用 engine 关键字 (GH38895 ， GH41267 )
DataFrame.apply() can now accept NumPy unary operators as strings, e.g. df.apply("sqrt"), which was already the case for Series.apply() (GH39116)
DataFrame.apply() can now accept non-callable DataFrame properties as strings, e.g. df.apply("size"), which was already the case for Series.apply() (GH39116)
DataFrame.applymap() can now accept kwargs to pass on to the user-provided func (GH39987)
Passing a DataFrame indexer to iloc is now disallowed for Series.__getitem__() and DataFrame.__getitem__() (GH39004)
Series.apply() can now accept list-like or dictionary-like arguments that aren't lists or dictionaries, e.g. ser.apply(np.array(["sum", "mean"])), which was already the case for DataFrame.apply() (GH39140)
DataFrame.plot.scatter() can now accept a categorical column for the argument c (GH12380, GH31357)
Series.loc() 现在，当系列有一个 MultiIndex 索引器的维度太多 (GH35349 )
read_stata() 现在支持从压缩文件中读取数据 (GH26599 )
Added support for parsing ISO 8601-like timestamps with negative signs to Timedelta (GH37172)
Added support for unary operators in FloatingArray (GH38749)
RangeIndex can now be constructed by passing a range object directly e.g. pd.RangeIndex(range(3)) (GH12067)
Series.round() 和 DataFrame.round() 现在使用可以为空的整型和浮点型数据类型 (GH38844 )
read_csv() 和 read_json() 揭露这一论点 encoding_errors 控制编码错误的处理方式 (GH39450 )
GroupBy.any() 和 GroupBy.all() 对可为空的数据类型使用Kleene逻辑 (GH37506 )
GroupBy.any() 和 GroupBy.all() 返回一个 BooleanDtype 对于数据类型可为空的列 (GH33449 )
GroupBy.any() and GroupBy.all() raising with object data containing pd.NA even when skipna=True (GH37501)
GroupBy.rank() 现在支持对象数据类型数据 (GH38278 )
Constructing a DataFrame or Series with the data argument being a Python iterable that is not a NumPy ndarray consisting of NumPy scalars will now result in a dtype with a precision the maximum of the NumPy scalars; this was already the case when data is a NumPy ndarray (GH40908)
添加关键字 sort 至 pivot_table() 要允许对结果不排序，请执行以下操作 (GH39143 )
添加关键字 dropna 至 DataFrame.value_counts() 要允许对包含以下内容的行进行计数，请执行以下操作 NA 值 (GH41325 )
Series.replace() 现在将结果转换为 PeriodDtype 在可能的情况下，代替 object 数据类型 (GH41526 )
Improved error message in corr and cov methods on Rolling, Expanding, and ExponentialMovingWindow when other is not a DataFrame or Series (GH41741)
Series.between() 现在可以接受 left 或 right 作为参数 inclusive 仅包括左边界或右边界 (GH40245 )
DataFrame.explode() 现在支持分解多个柱。它的 column 参数现在还接受同时在多个列上分解的字符串或元组的列表 (GH39240 )
DataFrame.sample() now accepts the ignore_index argument to reset the index after sampling, similar to DataFrame.drop_duplicates() and DataFrame.sort_values() (GH38581)

值得注意的错误修复#

这些错误修复可能会带来显著的行为变化。

`Categorical.unique` 现在始终保持与原始数据类型相同的数据类型#

以前，当调用 Categorical.unique() 对于分类数据，新数组中未使用的类别将被删除，从而使新数组的数据类型不同于原始数组 (GH18291 )

作为这方面的一个例子，给出：

In [19]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [20]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [21]: original = pd.Series(cat)

In [22]: unique = original.unique()

以前的行为 ：

In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False

新行为 ：

In [23]: unique
Out[23]: 
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']

In [24]: original.dtype == unique.dtype
Out[24]: True

将数据类型保留在 `DataFrame.combine_first()`#

DataFrame.combine_first() 现在将保留数据类型 (GH7509 )

In [25]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [26]: df1
Out[26]: 
   A  B
0  1  1
1  2  2
2  3  3

[3 rows x 2 columns]

In [27]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [28]: df2
Out[28]: 
   B  C
2  4  1
3  5  2
4  6  3

[3 rows x 2 columns]

In [29]: combined = df1.combine_first(df2)

以前的行为 ：

In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object

新行为 ：

In [30]: combined.dtypes
Out[30]: 
A    float64
B      int64
C    float64
Length: 3, dtype: object

Groupby方法agg和Transform不再更改可调用对象的返回数据类型#

以前，这些方法 DataFrameGroupBy.aggregate() ， SeriesGroupBy.aggregate() ， DataFrameGroupBy.transform() ，以及 SeriesGroupBy.transform() 可能在参数设置为 func 是可调用的，可能会导致不良结果 (GH21240 )。如果结果是数值，并且转换回输入数据类型不会更改任何值，则会发生强制转换 np.allclose 。现在，这样的选角没有发生。

In [31]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [32]: df
Out[32]: 
   key      a     b
0    1   True  True
1    1  False  True

[2 rows x 3 columns]

以前的行为 ：

In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
        a  b
key
1    True  2

新行为 ：

In [33]: df.groupby('key').agg(lambda x: x.sum())
Out[33]: 
     a  b
key      
1    1  2

[1 rows x 2 columns]

`float` result for `GroupBy.mean()`, `GroupBy.median()`, and `GroupBy.var()`#

以前，根据输入值的不同，这些方法可能产生不同的数据类型。现在，这些方法将始终返回一个浮点数据类型。 (GH41137 )

In [34]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})

以前的行为 ：

In [5]: df.groupby(df.index).mean()
Out[5]:
        a  b    c
0    True  1  1.0

新行为 ：

In [35]: df.groupby(df.index).mean()
Out[35]: 
     a    b    c
0  1.0  1.0  1.0

[1 rows x 3 columns]

使用设置值时尝试就地操作 `loc` 和 `iloc`#

使用设置整列时 loc 或 iloc ，Pandas将尝试将值插入到现有数据中，而不是创建一个全新的数组。

In [36]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [37]: values = df.values

In [38]: new = np.array([5, 6, 7], dtype="int64")

In [39]: df.loc[[0, 1, 2], "A"] = new

在新的和旧的行为中， values 被覆盖，但在旧行为中， df["A"] 更改为 int64 。

以前的行为 ：

In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False

在Pandas1.3.0中， df 继续与共享数据 values

新行为 ：

In [40]: df.dtypes
Out[40]: 
A    float64
Length: 1, dtype: object

In [41]: np.shares_memory(df["A"], new)
Out[41]: False

In [42]: np.shares_memory(df["A"], values)
Out[42]: True

设置时切勿原地操作 `frame[keys] = values`#

使用设置多个列时 frame[keys] = values 新数组将替换这些键的先前存在的数组，这将 not 被覆盖 (GH39510 )。因此，这些列将保留 values ，从不强制转换为现有数组的数据类型。

In [43]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [44]: df[["A"]] = 5

在过去的行为中， 5 被选为 float64 并插入到现有的阵列背衬中 df ：

以前的行为 ：

In [1]: df.dtypes
Out[1]:
A    float64

在新的行为中，我们获得了一个新的数组，并保留了一个整数类型的 5 ：

新行为 ：

In [45]: df.dtypes
Out[45]: 
A    int64
Length: 1, dtype: object

设置为布尔级数的一致性投射#

Setting non-boolean values into a Series with dtype=bool now consistently casts to dtype=object (GH38709)

In [46]: orig = pd.Series([True, False])

In [47]: ser = orig.copy()

In [48]: ser.iloc[1] = np.nan

In [49]: ser2 = orig.copy()

In [50]: ser2.iloc[1] = 2.0

以前的行为 ：

In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

新行为 ：

In [51]: ser
Out[51]: 
0    True
1     NaN
Length: 2, dtype: object

In [52]: ser2
Out[52]: 
0    True
1     2.0
Length: 2, dtype: object

GroupBy.Rolling不再返回值中的分组依据列#

GROUP-BY列现在将从 groupby.rolling 运营 (GH32262 )

In [53]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [54]: df
Out[54]: 
   A  B
0  1  0
1  1  1
2  2  2
3  3  3

[4 rows x 2 columns]

以前的行为 ：

In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
       A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN

新行为 ：

In [55]: df.groupby("A").rolling(2).sum()
Out[55]: 
       B
A       
1 0  NaN
  1  1.0
2 2  NaN
3 3  NaN

[4 rows x 1 columns]

去除了滚动方差和标准差中的人为截断#

Rolling.std() 和 Rolling.var() 将不再人为截断小于 ~1e-8 和 ~1e-15 分别降为零 (GH37051 ， GH40448 ， GH39872 )。

但是，现在当滚动到较大的值时，结果中可能存在浮点瑕疵。

In [56]: s = pd.Series([7, 5, 5, 5])

In [57]: s.rolling(3).var()
Out[57]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
Length: 4, dtype: float64

使用多索引滚动的GroupBy.不再删除结果中的级别#

GroupBy.rolling() 将不再降低 DataFrame 使用一个 MultiIndex 在结果中。这可能会导致感觉到结果中的级别重复 MultiIndex ，但此更改恢复了1.1.3版中的行为 (GH38787 ， GH38523 )。

In [58]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [59]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [60]: df
Out[60]: 
               a  b
label1 label2      
idx1   idx2    1  2

[1 rows x 2 columns]

以前的行为 ：

In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
          a    b
label1
idx1    1.0  2.0

新行为 ：

In [61]: df.groupby('label1').rolling(1).sum()
Out[61]: 
                        a    b
label1 label1 label2          
idx1   idx1   idx2    1.0  2.0

[1 rows x 2 columns]

向后不兼容的API更改#

提高了依赖项的最低版本#

更新了一些受支持的依赖项的最低版本。如果已安装，我们现在需要：

套餐	最低版本	必填项	变化
钱币	1.17.3	X	X
皮兹	2017.3	X
Python-Dateutil	2.7.3	X
瓶颈	1.2.1
数字快递	2.7.0		X
最热(Dev)	6.0		X
Mypy(开发人员)	0.812		X
安装工具	38.6.0		X

为 optional libraries 一般建议使用最新版本。下表列出了目前在整个Pandas发育过程中正在测试的每个库的最低版本。低于最低测试版本的可选库仍可运行，但不被视为受支持。

套餐	最低版本	变化
美味可口的汤	4.6.0
实木地板	0.4.0	X
FsSpec	0.7.4
Gcsf	0.6.0
Lxml	4.3.0
Matplotlib	2.2.3
Numba	0.46.0
OpenPyxl	3.0.0	X
绿箭侠	0.17.0	X
Pymysql	0.8.1	X
易燃物	3.5.1
S3FS	0.4.0
斯比	1.2.0
SQLALCHIZY	1.3.0	X
制表	0.8.7	X
XARRAY	0.12.0
Xlrd	1.2.0
Xlsx写入器	1.0.2
超大重量	1.3.0
Pandas-Gbq	0.12.0

看见依赖项和可选依赖项想要更多。

其他API更改#

部分初始化 CategoricalDtype 对象(即具有 categories=None )将不再等同于完全初始化的数据类型对象 (GH38516 )
访问 _constructor_expanddim 在一个 DataFrame 和 _constructor_sliced 在一个 Series 现在引发一个 AttributeError 。以前是 NotImplementedError 被提了出来 (GH38782 )
Added new engine and **engine_kwargs parameters to DataFrame.to_sql() to support other future "SQL engines". Currently we still only use SQLAlchemy under the hood, but more engines are planned to be supported such as turbodbc (GH36893)
删除冗余 freq 从… PeriodIndex 字符串表示法 (GH41653 )
ExtensionDtype.construct_array_type() 现在是必需的方法，而不是 ExtensionDtype 子类 (GH24860 )
Calling hash on non-hashable pandas objects will now raise TypeError with the built-in error message (e.g. unhashable type: 'Series'). Previously it would raise a custom message such as 'Series' objects are mutable, thus they cannot be hashed. Furthermore, isinstance(<Series>, abc.collections.Hashable) will now return False (GH40013)
Styler.from_custom_template() 现在有两个新的模板名称参数，并删除了旧的 name ，因为为了更好地解析而引入了模板继承 (GH42053 )。还需要对Styler属性进行子类化修改。

建房#

中的文档 .pptx 和 .pdf 格式不再包含在控制盘或源代码分发中。 (GH30741 )

不推荐使用#

不建议在DataFrame Reductions和DataFrameGroupBy操作中删除讨厌的列#

调用减价(例如 .min ， .max ， .sum )上 DataFrame 使用 numeric_only=None (缺省设置)，其中减少会引发 TypeError 被默默地忽略并从结果中删除。

此行为已弃用。在未来的版本中， TypeError 将引发，并且用户在调用该函数之前将只需要选择有效列。

例如：

In [62]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [63]: df
Out[63]: 
   A          B
0  1 2016-01-01
1  2 2016-01-02
2  3 2016-01-03
3  4 2016-01-04

[4 rows x 2 columns]

旧行为 ：

In [3]: df.prod()
Out[3]:
Out[3]:
A    24
dtype: int64

未来行为 ：

In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'

In [5]: df[["A"]].prod()
Out[5]:
A    24
dtype: int64

同样，在将函数应用于 DataFrameGroupBy ，函数在其上引发的列 TypeError 当前被静默忽略并从结果中删除。

此行为已弃用。在未来的版本中， TypeError 将引发，并且用户在调用该函数之前将只需要选择有效列。

例如：

In [64]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [65]: gb = df.groupby([1, 1, 2, 2])

旧行为 ：

In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1   2
2  12

未来行为 ：

In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations

In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
    A
1   2
2  12

其他不推荐使用的词#

已弃用，允许将标量传递给 Categorical 构造函数 (GH38433 )
不推荐使用的构造 CategoricalIndex 而不传递类似列表的数据 (GH38944 )
方法中允许特定于子类的关键字参数 Index 构造函数，则直接使用特定的子类 (GH14093 ， GH21311 ， GH22315 ， GH26974 )
不推荐使用 astype() 类日期计时法 (timedelta64[ns] ， datetime64[ns] ， Datetime64TZDtype ， PeriodDtype )要转换为整型数据类型，请使用 values.view(...) 取而代之的是 (GH38544 )。后来在Pandas1.4.0中恢复了这一废弃功能。
已弃用 MultiIndex.is_lexsorted() 和 MultiIndex.lexsort_depth() ，使用 MultiIndex.is_monotonic_increasing() 取而代之的是 (GH32259 )
不推荐使用的关键字 try_cast 在……里面 Series.where() ， Series.mask() ， DataFrame.where() ， DataFrame.mask() ；如果需要，手动转换结果 (GH38836 )
Deprecated comparison of Timestamp objects with datetime.date objects. Instead of e.g. ts <= mydate use ts <= pd.Timestamp(mydate) or ts.date() <= mydate (GH36131)
Deprecated Rolling.win_type returning "freq" (GH38963)
Deprecated Rolling.is_datetimelike (GH38963)
Deprecated DataFrame indexer for Series.__setitem__() and DataFrame.__setitem__() (GH39004)
Deprecated ExponentialMovingWindow.vol() (GH39220)
使用 .astype 在以下各项之间进行转换 datetime64[ns] 数据类型和 DatetimeTZDtype 已弃用，并将在将来的版本中引发，请使用 obj.tz_localize 或 obj.dt.tz_localize 取而代之的是 (GH38622 )
已弃用的强制转换 datetime.date 目标对象 datetime64 当用作 fill_value 在……里面 DataFrame.unstack() ， DataFrame.shift() ， Series.shift() ，以及 DataFrame.reindex() ，通过 pd.Timestamp(dateobj) 取而代之的是 (GH39767 )
已弃用 Styler.set_na_rep() 和 Styler.set_precision() 赞成 Styler.format() 使用 na_rep 和 precision 分别作为现有输入参数和新输入参数 (GH40134 ， GH40425 )
Deprecated Styler.where() in favor of using an alternative formulation with Styler.applymap() (GH40821)
不推荐允许部分失败 Series.transform() 和 DataFrame.transform() 什么时候 func 像列表或字典一样，并提出任何不 TypeError ； func 筹集任何东西，而不是 TypeError 将在未来版本中提高 (GH40211 )
Deprecated arguments error_bad_lines and warn_bad_lines in read_csv() and read_table() in favor of argument on_bad_lines (GH15122)
已弃用的支持 np.ma.mrecords.MaskedRecords 在 DataFrame 构造函数，传递 {{name: data[name] for name in data.dtype.names}} 取而代之的是 (GH40363 )
不推荐使用 merge() ， DataFrame.merge() ，以及 DataFrame.join() 在不同的层次上 (GH34862 )
不赞成使用…… **kwargs 在……里面 ExcelWriter ；使用关键字参数 engine_kwargs 取而代之的是 (GH40430 )
不推荐使用 level 的关键字 DataFrame 和 Series 聚合；改用GROUPBY (GH39983 )
不推荐使用 inplace 的参数 Categorical.remove_categories() ， Categorical.add_categories() ， Categorical.reorder_categories() ， Categorical.rename_categories() ， Categorical.set_categories() 并将在将来的版本中删除 (GH37643 )
已弃用 merge() 生成重复的列。 suffixes 关键字和已存在的列 (GH22818 )
不推荐使用的设置 Categorical._codes ，创建新的 Categorical 而不是使用所需的代码 (GH40606 )
Deprecated the convert_float optional argument in read_excel() and ExcelFile.parse() (GH41127)
不推荐使用的行为 DatetimeIndex.union() 具有混合时区；在未来版本中，两者都将转换为UTC，而不是对象dtype (GH39328 )
Deprecated using usecols with out of bounds indices for read_csv() with engine="c" (GH25623)
中第一个元素为范畴的列表的特殊处理 DataFrame 构造函数；作为 pd.DataFrame({{col: categorical, ...}}) 取而代之的是 (GH38845 )
不推荐使用的行为 DataFrame 构造函数时， dtype 则不能将数据强制转换为该数据类型。在将来的版本中，这将被引发，而不是被默默忽略 (GH24435 )
Deprecated the Timestamp.freq attribute. For the properties that use it (is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end), when you have a freq, use e.g. freq.is_month_start(ts) (GH15146)
不推荐使用的结构 Series 或 DataFrame 使用 DatetimeTZDtype 数据和 datetime64[ns] 数据类型。使用 Series(data).dt.tz_localize(None) 取而代之的是 (GH41555 ， GH33401 )
不推荐使用的行为 Series 具有大整数值和小整数数据类型静默溢出的构造；使用 Series(data).astype(dtype) 取而代之的是 (GH41734 )
不推荐使用的行为 DataFrame 使用浮点数据和整型数据类型转换的结构，即使在有损的情况下也是如此；在未来的版本中，这将保持浮点、匹配 Series 行为 (GH41770 )
已弃用的推论 timedelta64[ns] ， datetime64[ns] ，或 DatetimeTZDtype 中的数据类型 Series 传递包含字符串的数据时构造，并且没有 dtype 已通过 (GH33558 )
In a future version, constructing Series or DataFrame with datetime64[ns] data and DatetimeTZDtype will treat the data as wall-times instead of as UTC times (matching DatetimeIndex behavior). To treat the data as UTC times, use pd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz) or pd.Series(data.view("int64"), dtype=dtype) (GH33401)
Deprecated passing lists as key to DataFrame.xs() and Series.xs() (GH41760)
不推荐使用的布尔参数 inclusive 在……里面 Series.between() 拥有 {{"left", "right", "neither", "both"}} 作为标准参数值 (GH40628 )
不建议将参数作为位置参数传递给以下所有对象，但注意到例外情况 (GH41485 )：
- concat() (除 objs )
- read_csv() (除 filepath_or_buffer )
- read_table() (除 filepath_or_buffer )
- DataFrame.clip() 和 Series.clip() (除 upper 和 lower )
- DataFrame.drop_duplicates() (except for subset), Series.drop_duplicates(), Index.drop_duplicates() and MultiIndex.drop_duplicates()
- DataFrame.drop() (other than labels) and Series.drop()
- DataFrame.dropna() and Series.dropna()
- DataFrame.ffill(), Series.ffill(), DataFrame.bfill(), and Series.bfill()
- DataFrame.fillna() 和 Series.fillna() (除 value )
- DataFrame.interpolate() 和 Series.interpolate() (除 method )
- DataFrame.mask() 和 Series.mask() (除 cond 和 other )
- DataFrame.reset_index() (other than level) and Series.reset_index()
- DataFrame.set_axis() 和 Series.set_axis() (除 labels )
- DataFrame.set_index() (除 keys )
- DataFrame.sort_index() and Series.sort_index()
- DataFrame.sort_values() (other than by) and Series.sort_values()
- DataFrame.where() 和 Series.where() (除 cond 和 other )
- Index.set_names() 和 MultiIndex.set_names() (除 names )
- MultiIndex.codes() (除 codes )
- MultiIndex.set_levels() (除 levels )
- Resampler.interpolate() (除 method )

性能改进#

Performance improvement in IntervalIndex.isin() (GH38353)
性能提升 Series.mean() 对于可为空的数据类型 (GH34814 )
性能提升 Series.isin() 对于可为空的数据类型 (GH38340 )
性能提升 DataFrame.fillna() 使用 method="pad" 或 method="backfill" 对于可为空的浮点型和可为空的整型数据类型 (GH39953 )
Performance improvement in DataFrame.corr() for method=kendall (GH28329)
Performance improvement in DataFrame.corr() for method=spearman (GH40956, GH41885)
Performance improvement in Rolling.corr() and Rolling.cov() (GH39388)
Performance improvement in RollingGroupby.corr(), ExpandingGroupby.corr(), ExpandingGroupby.corr() and ExpandingGroupby.cov() (GH39591)
性能提升 unique() 对于对象数据类型 (GH37615 )
性能提升 json_normalize() 适用于基本情况(包括分隔符) (GH40035 GH15621 )
性能提升 ExpandingGroupby 聚合方法 (GH39664 )
Performance improvement in Styler where render times are more than 50% reduced and now matches DataFrame.to_html() (GH39972 GH39952, GH40425)
该方法 Styler.set_td_classes() 现在的表现就像 Styler.apply() 和 Styler.applymap() ，在某些情况下甚至更是如此 (GH40453 )
Performance improvement in ExponentialMovingWindow.mean() with times (GH39784)
性能提升 GroupBy.apply() 当需要Python回退实现时 (GH40176 )
将PyArrow布尔数组转换为Pandas可为空的布尔数组的性能改进 (GH41051 )
Performance improvement for concatenation of data with type CategoricalDtype (GH40193)
性能提升 GroupBy.cummin() 和 GroupBy.cummax() 具有可为空的数据类型 (GH37493 )
性能提升 Series.nunique() 使用NaN值 (GH40865 )
Performance improvement in DataFrame.transpose(), Series.unstack() with DatetimeTZDtype (GH40149)
性能提升 Series.plot() 和 DataFrame.plot() 使用入口点延迟加载 (GH41492 )

错误修复#

直截了当的#

窃听 CategoricalIndex 错误地未能筹集 TypeError 在传递标量数据时 (GH38614 )
窃听 CategoricalIndex.reindex 失败时，如果 Index Passed不是绝对的，但其值是类别中的所有标签 (GH28690 )
Bug where constructing a Categorical from an object-dtype array of date objects did not round-trip correctly with astype (GH38552)
Bug in constructing a DataFrame from an ndarray and a CategoricalDtype (GH38857)
Bug in setting categorical values into an object-dtype column in a DataFrame (GH39136)
Bug in DataFrame.reindex() was raising an IndexError when the new index contained duplicates and the old index was a CategoricalIndex (GH38906)
窃听 Categorical.fillna() 使用类似元组的类别引发 NotImplementedError 而不是 ValueError 使用非类别元组填充时 (GH41914 )

类似日期的#

窃听 DataFrame 和 Series 构造函数有时会从 Timestamp (请回复。 Timedelta ) data ，具有 dtype=datetime64[ns] (请回复。 timedelta64[ns] ) (GH38032 )
窃听 DataFrame.first() 和 Series.first() 如果第一天是一个月的最后一天，则一个月的偏移量返回不正确的结果 (GH29623 )
Bug in constructing a DataFrame or Series with mismatched datetime64 data and timedelta64 dtype, or vice-versa, failing to raise a TypeError (GH38575, GH38764, GH38792)
在构造一个 Series 或 DataFrame 使用一个 datetime 对象超出的边界 datetime64[ns] Dtype或a timedelta 对象超出的边界 timedelta64[ns] 数据类型 (GH38792 ， GH38965 )
Bug in DatetimeIndex.intersection(), DatetimeIndex.symmetric_difference(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference() always returning object-dtype when operating with CategoricalIndex (GH38741)
Bug in DatetimeIndex.intersection() giving incorrect results with non-Tick frequencies with n != 1 (GH42104)
Bug in Series.where() incorrectly casting datetime64 values to int64 (GH37682)
Bug in Categorical incorrectly typecasting datetime object to Timestamp (GH38878)
Bug in comparisons between Timestamp object and datetime64 objects just outside the implementation bounds for nanosecond datetime64 (GH39221)
Bug in Timestamp.round(), Timestamp.floor(), Timestamp.ceil() for values near the implementation bounds of Timestamp (GH39244)
Bug in Timedelta.round(), Timedelta.floor(), Timedelta.ceil() for values near the implementation bounds of Timedelta (GH38964)
窃听 date_range() 创建错误 DatetimeIndex 包含 NaT 与其提高 OutOfBoundsDatetime 在角落里的情况下 (GH24124 )
窃听 infer_freq() 错误地未能推断出H的频率 DatetimeIndex 如果后者有时区并跨越DST边界 (GH39556 )
Bug in Series backed by DatetimeArray or TimedeltaArray sometimes failing to set the array's freq to None (GH41425)

Timedelta#

Bug in constructing Timedelta from np.timedelta64 objects with non-nanosecond units that are out of bounds for timedelta64[ns] (GH38965)
在构造一个 TimedeltaIndex 错误地接受 np.datetime64("NaT") 对象 (GH39462 )
建筑中的错误 Timedelta 从只有符号而没有数字的输入字符串引发错误失败 (GH39710 )
Bug in TimedeltaIndex and to_timedelta() failing to raise when passed non-nanosecond timedelta64 arrays that overflow when converting to timedelta64[ns] (GH40008)

时区#

不同版本中的错误 tzinfo 表示UTC的对象不被视为等效 (GH39216 )
窃听 dateutil.tz.gettz("UTC") 不被认为等同于代表tzinfos的其他UTC (GH39276 )

数字#

窃听 DataFrame.quantile() ， DataFrame.sort_values() 导致不正确的后续索引行为 (GH38351 )
Bug in DataFrame.sort_values() raising an IndexError for empty by (GH40258)
窃听 DataFrame.select_dtypes() 使用 include=np.number 将丢弃数字 ExtensionDtype 列 (GH35340 )
窃听 DataFrame.mode() 和 Series.mode() 整数不一致 Index 对于空输入 (GH33321 )
Bug in DataFrame.rank() when the DataFrame contained np.inf (GH32593)
Bug in DataFrame.rank() with axis=0 and columns holding incomparable types raising an IndexError (GH38932)
窃听 Series.rank() ， DataFrame.rank() ，以及 GroupBy.rank() 治疗最负面的 int64 缺少的值 (GH32859 )
Bug in DataFrame.select_dtypes() different behavior between Windows and Linux with include="int" (GH36596)
窃听 DataFrame.apply() 和 DataFrame.agg() 当传递参数时 func="size" 会在整个 DataFrame 而不是行或列 (GH39934 )
窃听 DataFrame.transform() 会引发一个 SpecificationError 传递时缺少词典和列；现在将引发 KeyError 取而代之的是 (GH40004 )
窃听 GroupBy.rank() 给出不正确的结果 pct=True 和连续组之间的相等值 (GH40518 )
Bug in Series.count() would result in an int32 result on 32-bit platforms when argument level=None (GH40908)
窃听 Series 和 DataFrame 用方法减少费用 any 和 all 不返回对象数据的布尔结果 (GH12863 ， GH35450 ， GH27709 )
窃听 Series.clip() 如果序列包含NA值并且数据类型可以为空，则会失败 (GH40851 )
Bug in UInt64Index.where() and UInt64Index.putmask() with an np.int64 dtype other incorrectly raising TypeError (GH41974)
窃听 DataFrame.agg() 当一个或多个聚集函数未能产生结果时，不按照所提供的聚集函数的顺序对聚集轴进行排序 (GH33634 )
窃听 DataFrame.clip() 不将缺少的值解释为无阈值 (GH40420 )

转换#

窃听 Series.to_dict() 使用 orient='records' 现在返回Python本机类型 (GH25969 )
窃听 Series.view() 和 Index.view() 当在类DateTime之间进行转换时 (datetime64[ns] ， datetime64[ns, tz] ， timedelta64 ， period )数据类型 (GH39788 )
在创建一个 DataFrame 从一个空虚的 np.recarray 不保留原始数据类型 (GH40121 )
Bug in DataFrame failing to raise a TypeError when constructing from a frozenset (GH40163)
窃听 Index 构造静默忽略已传递的 dtype 当数据无法转换为该数据类型时 (GH21311 )
Bug in StringArray.astype() falling back to NumPy and raising when converting to dtype='categorical' (GH40450)
窃听 factorize() 其中，当给定数值NumPy dtype小于int64、uint64和flat64的数组时，唯一值不保留其原始数据类型 (GH41132 )
窃听 DataFrame 使用包含类似数组的 ExtensionDtype 和 copy=True 未能复制副本 (GH38939 )
窃听 qcut() 取数时出现误差 Float64DType 作为输入 (GH40730 )
窃听 DataFrame 和 Series 使用以下工具进行施工 datetime64[ns] 数据和 dtype=object 导致 datetime 对象而不是 Timestamp 对象 (GH41599 )
窃听 DataFrame 和 Series 使用以下工具进行施工 timedelta64[ns] 数据和 dtype=object 导致 np.timedelta64 对象而不是 Timedelta 对象 (GH41599 )
窃听 DataFrame 在给定二维对象-dtype时构造 np.ndarray 的 Period 或 Interval 未能强制转换为的对象 PeriodDtype 或 IntervalDtype ，分别 (GH41812 )
Bug in constructing a Series from a list and a PandasDtype (GH39357)
在创建一个 Series 从一个 range 对象，该对象不在 int64 数据类型 (GH30173 )
在创建一个 Series 从一个 dict 具有全元组键和 Index 这需要重新编制索引 (GH41707 )
窃听 infer_dtype() 无法识别具有句点数据类型的系列、索引或数组 (GH23553 )
窃听 infer_dtype() 引发常规错误 ExtensionArray 物体。它现在将返回 "unknown-array" 与其提高 (GH37367 )
窃听 DataFrame.convert_dtypes() 错误地引发了 ValueError 在空的DataFrame上调用时 (GH40393 )

字符串#

从转换中出现错误 pyarrow.ChunkedArray 至 StringArray 当原版没有任何数据块时 (GH41040 )
窃听 Series.replace() 和 DataFrame.replace() 忽略替换为 regex=True 为 StringDType 数据 (GH41333 ， GH35977 )
Bug in Series.str.extract() with StringArray returning object dtype for an empty DataFrame (GH41441)
Bug in Series.str.replace() where the case argument was ignored when regex=False (GH41602)

间隔#

Bug in IntervalIndex.intersection() and IntervalIndex.symmetric_difference() always returning object-dtype when operating with CategoricalIndex (GH38653, GH38741)
窃听 IntervalIndex.intersection() 属性中的至少一个返回重复项 Index 对象具有存在于其他对象中的重复项 (GH38743 )
IntervalIndex.union() ， IntervalIndex.intersection() ， IntervalIndex.difference() ，以及 IntervalIndex.symmetric_difference() 现在强制转换为适当的dtype，而不是引发 TypeError 当与其他人一起操作时 IntervalIndex 具有不兼容的数据类型 (GH39267 )
PeriodIndex.union() ， PeriodIndex.intersection() ， PeriodIndex.symmetric_difference() ， PeriodIndex.difference() 现在强制转换为对象dtype，而不是引发 IncompatibleFrequency 当与其他人一起操作时 PeriodIndex 具有不兼容的数据类型 (GH39306 )
窃听 IntervalIndex.is_monotonic() ， IntervalIndex.get_loc() ， IntervalIndex.get_indexer_for() ，以及 IntervalIndex.__contains__() 当存在NA值时 (GH41831 )

标引#

Bug in Index.union() and MultiIndex.union() dropping duplicate Index values when Index was not monotonic or sort was set to False (GH36289, GH31326, GH40862)
窃听 CategoricalIndex.get_indexer() 未能筹集到 InvalidIndexError 非唯一时 (GH38372 )
窃听 IntervalIndex.get_indexer() 什么时候 target 有 CategoricalDtype 并且索引和目标都包含NA值 (GH41934 )
窃听 Series.loc() 提高一名 ValueError 使用布尔列表过滤输入时，要设置的值是具有较低维度的列表 (GH20438 )
将许多新列插入到 DataFrame 导致不正确的后续索引行为 (GH38380 )
窃听 DataFrame.__setitem__() 提高一名 ValueError 将多个值设置为重复列时 (GH15695 )
窃听 DataFrame.loc() ， Series.loc() ， DataFrame.__getitem__() 和 Series.__getitem__() 为非单调返回不正确的元素 DatetimeIndex 对于字符串切片 (GH33146 )
Bug in DataFrame.reindex() and Series.reindex() with timezone aware indexes raising a TypeError for method="ffill" and method="bfill" and specified tolerance (GH38566)
窃听 DataFrame.reindex() 使用 datetime64[ns] 或 timedelta64[ns] 属性时错误地转换为整数 fill_value 需要强制转换为对象数据类型 (GH39755 )
窃听 DataFrame.__setitem__() 提高一名 ValueError 当设置在空的 DataFrame 使用指定的列和非空值 DataFrame 价值 (GH38831 )
窃听 DataFrame.loc.__setitem__() 提高一名 ValueError 在唯一列上操作时，如果 DataFrame 具有重复的列 (GH38521 )
窃听 DataFrame.iloc.__setitem__() 和 DataFrame.loc.__setitem__() 在使用字典值进行设置时使用混合数据类型 (GH38335 )
窃听 Series.loc.__setitem__() 和 DataFrame.loc.__setitem__() 加薪 KeyError 当提供布尔生成器时 (GH39614 )
窃听 Series.iloc() 和 DataFrame.iloc() 提高一名 KeyError 当提供发电机时 (GH39614 )
窃听 DataFrame.__setitem__() 而不是引发 ValueError 当右侧是一个 DataFrame 列数错误 (GH38604 )
窃听 Series.__setitem__() 提高一名 ValueError 在设置 Series 使用标量索引器 (GH38303 )
窃听 DataFrame.loc() 正在降低 MultiIndex 当 DataFrame 用作输入的只有一行 (GH10521 )
窃听 DataFrame.__getitem__() 和 Series.__getitem__() 总是在提高 KeyError 当使用现有字符串进行切片时， Index 有几毫秒 (GH33589 )
设置中存在错误 timedelta64 或 datetime64 将值转换为数字 Series 无法强制转换为对象数据类型 (GH39086 ， GH39619 )
设置中存在错误 Interval 值转换为 Series 或 DataFrame 不匹配的 IntervalDtype 错误地将新值转换为现有的dtype (GH39120 )
设置中存在错误 datetime64 值转换为 Series 使用INTEGER-DTYPE错误地将DateTime64值转换为整数 (GH39266 )
设置中存在错误 np.datetime64("NaT") 变成一个 Series 使用 Datetime64TZDtype 错误地将时区朴素值视为时区感知 (GH39769 )
Bug in Index.get_loc() not raising KeyError when key=NaN and method is specified but NaN is not in the Index (GH39382)
窃听 DatetimeIndex.insert() 当插入时 np.datetime64("NaT") 错误地将时区朴素值视为时区感知索引 (GH39769 )
错误地提高了 Index.insert() 设置不能保存在现有 frame.columns ，或在 Series.reset_index() 或 DataFrame.reset_index() 而不是强制转换为兼容的数据类型 (GH39068 )
窃听 RangeIndex.append() 长度为1的单个对象连接不正确 (GH39401 )
Bug in RangeIndex.astype() where when converting to CategoricalIndex, the categories became a Int64Index instead of a RangeIndex (GH41263)
设置中存在错误 numpy.timedelta64 值转换为对象数据类型 Series 使用布尔索引器 (GH39488 )
将数值设置为布尔型时出错 Series 使用 at 或 iat 无法强制转换为对象数据类型 (GH39582 )
窃听 DataFrame.__setitem__() 和 DataFrame.iloc.__setitem__() 加薪 ValueError 尝试使用行切片进行索引并将列表设置为值时 (GH40440 )
窃听 DataFrame.loc() 不是募捐 KeyError 在以下位置找不到密钥 MultiIndex 并且没有完全详细说明这些水平 (GH41170 )
窃听 DataFrame.loc.__setitem__() 当扩展轴中的索引包含重复项时，设置WITH-EXPAND错误提升 (GH40096 )
窃听 DataFrame.loc.__getitem__() 使用 MultiIndex 当至少一个索引列具有FLOAT数据类型并且检索标量时，强制转换为FLOAT (GH41369 )
窃听 DataFrame.loc() 非布尔索引元素不正确匹配 (GH20432 )
使用索引时出现错误 np.nan 在一个 Series 或 DataFrame 使用一个 CategoricalIndex 错误地提高 KeyError 什么时候 np.nan 钥匙已存在 (GH41933 )
Bug in Series.__delitem__() with ExtensionDtype incorrectly casting to ndarray (GH40386)
窃听 DataFrame.at() 使用一个 CategoricalIndex 传递整型键时返回错误结果 (GH41846 )
窃听 DataFrame.loc() 返回一个 MultiIndex 如果索引器有重复项，则顺序错误 (GH40978 )
Bug in DataFrame.__setitem__() raising a TypeError when using a str subclass as the column name with a DatetimeIndex (GH37366)
Bug in PeriodIndex.get_loc() failing to raise a KeyError when given a Period with a mismatched freq (GH41670)
虫虫 .loc.__getitem__ 使用一个 UInt64Index 和负整数键提升 OverflowError 而不是 KeyError 在某些情况下，在其他情况下绕回到正整数 (GH41777 )
窃听 Index.get_indexer() 未能筹集到 ValueError 在某些情况下，无效 method ， limit ，或 tolerance 论据 (GH41918 )
Bug when slicing a Series or DataFrame with a TimedeltaIndex when passing an invalid string raising ValueError instead of a TypeError (GH41821)
Bug in Index constructor sometimes silently ignoring a specified dtype (GH38879)
Index.where() behavior now mirrors Index.putmask() behavior, i.e. index.where(mask, other) matches index.putmask(~mask, other) (GH39412)

丢失#

Bug in Grouper did not correctly propagate the dropna argument; DataFrameGroupBy.transform() now correctly handles missing values for dropna=True (GH35612)
窃听 isna() ， Series.isna() ， Index.isna() ， DataFrame.isna() ，以及相应的 notna 未识别的功能 Decimal("NaN") 对象 (GH39409 )
窃听 DataFrame.fillna() 不接受词典为 downcast 关键字 (GH40809 )
窃听 isna() 不返回可为空的类型的掩码副本，导致任何后续掩码修改更改原始数组 (GH40935 )
Bug in DataFrame construction with float data containing NaN and an integer dtype casting instead of retaining the NaN (GH26919)
窃听 Series.isin() 和 MultiIndex.isin() 如果所有NAN都在元组中，则不会将其视为等同 (GH41836 )

MultiIndex#

窃听 DataFrame.drop() 提高一名 TypeError 当 MultiIndex 不是唯一的，并且 level 未提供 (GH36293 )
窃听 MultiIndex.intersection() 复制 NaN 在结果中 (GH38623 )
窃听 MultiIndex.equals() 返回错误 True 当 MultiIndex 包含 NaN 即使它们的顺序不同 (GH38439 )
Bug in MultiIndex.intersection() always returning an empty result when intersecting with CategoricalIndex (GH38653)
窃听 MultiIndex.difference() 错误地提高 TypeError 当索引包含不可排序的条目时 (GH41915 )
窃听 MultiIndex.reindex() 提高一名 ValueError 在空白处使用时 MultiIndex 并且只对特定级别进行索引 (GH41170 )
Bug in MultiIndex.reindex() raising TypeError when reindexing against a flat Index (GH41707)

I/O#

Bug in Index.__repr__() when display.max_seq_items=1 (GH38415)
Bug in read_csv() not recognizing scientific notation if the argument decimal is set and engine="python" (GH31920)
Bug in read_csv() interpreting NA value as comment, when NA does contain the comment string fixed for engine="python" (GH34002)
窃听 read_csv() 举起一个 IndexError 具有多个标题列和 index_col 当文件没有数据行时指定 (GH38292 )
Bug in read_csv() not accepting usecols with a different length than names for engine="python" (GH16469)
Bug in read_csv() returning object dtype when delimiter="," with usecols and parse_dates specified for engine="python" (GH35873)
Bug in read_csv() raising a TypeError when names and parse_dates is specified for engine="c" (GH33699)
窃听 read_clipboard() 和 DataFrame.to_clipboard() 不在WSL中工作 (GH38527 )
Allow custom error values for the parse_dates argument of read_sql(), read_sql_query() and read_sql_table() (GH35185)
Bug in DataFrame.to_hdf() and Series.to_hdf() raising a KeyError when trying to apply for subclasses of DataFrame or Series (GH33748)
窃听 HDFStore.put() 养错了人 TypeError 保存具有非字符串数据类型的DataFrame时 (GH34274 )
窃听 json_normalize() 导致生成器对象的第一个元素不包括在返回的DataFrame中 (GH35923 )
Bug in read_csv() applying the thousands separator to date columns when the column should be parsed for dates and usecols is specified for engine="python" (GH39365)
窃听 read_excel() 正向充填 MultiIndex 指定多个标题列和索引列时的名称 (GH34673 )
Bug in read_excel() not respecting set_option() (GH34252)
窃听 read_csv() 不切换 true_values 和 false_values 对于可为空的布尔数据类型 (GH34655 )
窃听 read_json() 什么时候 orient="split" 不维护数字字符串索引 (GH28556 )
read_sql() 返回空的生成器，如果 chunksize 为非零，并且查询未返回任何结果。现在返回一个带有单个空DataFrame的生成器 (GH34411 )
窃听 read_hdf() 方法筛选类别字符串列时返回意外记录 where 参数 (GH39189 )
窃听 read_sas() 提高一名 ValueError 什么时候 datetimes 为空 (GH39725 )
窃听 read_excel() 从单列电子表格中删除空值 (GH39808 )
窃听 read_excel() 加载某些文件类型的尾随空行/空列 (GH41167 )
窃听 read_excel() 举起一个 AttributeError 当EXCEL文件具有 MultiIndex 标题后面跟两个空行，没有索引 (GH40442 )
窃听 read_excel() ， read_csv() ， read_table() ， read_fwf() ，以及 read_clipboard() 其中，在一个 MultiIndex 将丢弃没有索引的标头 (GH40442 )
Bug in DataFrame.to_string() misplacing the truncation column when index=False (GH40904)
Bug in DataFrame.to_string() adding an extra dot and misaligning the truncation row when index=False (GH40904)
Bug in read_orc() always raising an AttributeError (GH40918)
Bug in read_csv() and read_table() silently ignoring prefix if names and prefix are defined, now raising a ValueError (GH39123)
Bug in read_csv() and read_excel() not respecting the dtype for a duplicated column name when mangle_dupe_cols is set to True (GH35211)
Bug in read_csv() silently ignoring sep if delimiter and sep are defined, now raising a ValueError (GH39823)
窃听 read_csv() 和 read_table() 曲解论点时 sys.setprofile 之前曾被称为 (GH41069 )
使用可以为空的数据类型和数据缓冲区大小不是dtype大小倍数的PyArrow数组从PyArrow转换为Pandas时出错(例如，用于读取拼图) (GH40896 )
窃听 read_excel() 当Pandas无法确定文件类型时，即使用户指定了 engine 论据 (GH41225 )
窃听 read_clipboard() 如果第一列中存在空值，则从Excel文件中复制会将值转移到错误的列中 (GH41108 )
窃听 DataFrame.to_hdf() 和 Series.to_hdf() 提高一名 TypeError 尝试将字符串列追加到不兼容的列时 (GH41897 )

期间#

比较 Period 对象或 Index ， Series ，或 DataFrame 不匹配的 PeriodDtype 现在的行为与其他不匹配的类型比较一样，返回 False 对于平等的人来说， True 对于不平等，并提高 TypeError 对于不平等检查 (GH39274 )

标绘#

窃听 plotting.scatter_matrix() 在2d时引发 ax 传递的参数 (GH16253 )
当Matplotlib的 constrained_layout 已启用 (GH25261 )
窃听 DataFrame.plot() 如果重复调用函数并使用某些调用，则在图例中显示错误的颜色 yerr 而其他人则没有 (GH39522 )
Bug in DataFrame.plot() was showing the wrong colors in the legend if the function was called repeatedly and some calls used secondary_y and others use legend=False (GH40044)
窃听 DataFrame.plot.box() 什么时候 dark_background 已选择主题，绘图的大写字母或最小/最大值标记不可见 (GH40769 )

分组/重采样/滚动#

窃听 GroupBy.agg() 使用 PeriodDtype 栏目错误地投射结果过于激进 (GH38254 )
窃听 SeriesGroupBy.value_counts() 在分组的分类序列中未观察到的类别未被统计的情况下 (GH38672 )
窃听 SeriesGroupBy.value_counts() 在空序列上引发错误的位置 (GH39172 )
窃听 GroupBy.indices() 当Groupby键中存在空值时，将包含不存在的索引 (GH9304 )
修复了中的错误 GroupBy.sum() 导致现在使用Kahan求和造成精度损失 (GH38778 )
修复了中的错误 GroupBy.cumsum() 和 GroupBy.mean() 使用Kahan求和造成精度损失 (GH38934 )
窃听 Resampler.aggregate() 和 DataFrame.transform() 提高一名 TypeError 而不是 SpecificationError 当丢失的键具有混合数据类型时 (GH39025 )
窃听 DataFrameGroupBy.idxmin() 和 DataFrameGroupBy.idxmax() 使用 ExtensionDtype 列 (GH38733 )
Bug in Series.resample() would raise when the index was a PeriodIndex consisting of NaT (GH39227)
窃听 RollingGroupby.corr() 和 ExpandingGroupby.corr() Groupby列将返回的位置 0 而不是 np.nan 当提供 other 这比每一组都要长 (GH39591 )
窃听 ExpandingGroupby.corr() 和 ExpandingGroupby.cov() 哪里 1 将被返回，而不是 np.nan 当提供 other 这比每一组都要长 (GH39591 )
窃听 GroupBy.mean() ， GroupBy.median() 和 DataFrame.pivot_table() 不传播元数据 (GH28283 )
窃听 Series.rolling() 和 DataFrame.rolling() 当窗口为偏移量且日期按降序排列时，无法正确计算窗口边界 (GH40002 )
Bug in Series.groupby() and DataFrame.groupby() on an empty Series or DataFrame would lose index, columns, and/or data types when directly using the methods idxmax, idxmin, mad, min, max, sum, prod, and skew or using them through apply, aggregate, or resample (GH26411)
窃听 GroupBy.apply() 其中一个 MultiIndex 将被创建，而不是创建 Index 当在 RollingGroupby 对象 (GH39732 )
Bug in DataFrameGroupBy.sample() where an error was raised when weights was specified and the index was an Int64Index (GH39927)
窃听 DataFrameGroupBy.aggregate() 和 Resampler.aggregate() 有时会引发一个 SpecificationError 传递时，字典和列将丢失；现在将始终引发 KeyError 取而代之的是 (GH40004 )
窃听 DataFrameGroupBy.sample() 其中，在计算结果之前未应用列选择 (GH39928 )
Bug in ExponentialMovingWindow when calling __getitem__ would incorrectly raise a ValueError when providing times (GH40164)
窃听 ExponentialMovingWindow 当呼叫 __getitem__ 不会保留 com ， span ， alpha 或 halflife 属性 (GH40164 )
ExponentialMovingWindow 现在引发一个 NotImplementedError 当指定 times 使用 adjust=False 由于计算错误 (GH40098 )
Bug in ExponentialMovingWindowGroupby.mean() where the times argument was ignored when engine='numba' (GH40951)
窃听 ExponentialMovingWindowGroupby.mean() 在多个组的情况下使用了错误的时间 (GH40951 )
窃听 ExponentialMovingWindowGroupby 对于非平凡的组，时间向量和值变得不同步 (GH40951 )
窃听 Series.asfreq() 和 DataFrame.asfreq() 在索引未排序时删除行 (GH39805 )
的聚合函数中存在错误 DataFrame 不尊重 numeric_only 在以下情况下的参数 level 已给出关键字 (GH40660 )
窃听 SeriesGroupBy.aggregate() 使用用户定义的函数将系列与对象类型的 Index 导致不正确的 Index 形状 (GH40014 )
窃听 RollingGroupby 哪里 as_index=False 中的参数 groupby 被忽略了 (GH39433 )
Bug in GroupBy.any() and GroupBy.all() raising a ValueError when using with nullable type columns holding NA even with skipna=True (GH40585)
窃听 GroupBy.cummin() 和 GroupBy.cummax() 参数附近的整数值进行错误舍入。 int64 实现范围 (GH40767 )
Bug in GroupBy.rank() with nullable dtypes incorrectly raising a TypeError (GH41010)
窃听 GroupBy.cummin() 和 GroupBy.cummax() 在强制转换为浮点型时，由于可为空的数据类型太大而无法往返计算错误结果 (GH37493 )
窃听 DataFrame.rolling() 返回所有人的平均值为零 NaN 窗，带 min_periods=0 如果计算不是数值稳定 (GH41053 )
窃听 DataFrame.rolling() 返回所有对象的总和不为零 NaN 窗，带 min_periods=0 如果计算不是数值稳定 (GH41053 )
窃听 SeriesGroupBy.agg() 没有保留命令 CategoricalDtype 关于保序集结 (GH41147 )
Bug in GroupBy.min() and GroupBy.max() with multiple object-dtype columns and numeric_only=False incorrectly raising a ValueError (GH41111)
Bug in DataFrameGroupBy.rank() with the GroupBy object's axis=0 and the rank method's keyword axis=1 (GH41320)
Bug in DataFrameGroupBy.__getitem__() with non-unique columns incorrectly returning a malformed SeriesGroupBy instead of DataFrameGroupBy (GH41427)
Bug in DataFrameGroupBy.transform() with non-unique columns incorrectly raising an AttributeError (GH41427)
窃听 Resampler.apply() 使用非唯一列错误地删除重复的列 (GH41445 )
窃听 Series.groupby() 聚合错误地返回空 Series 与其提高 TypeError 对于对其数据类型无效的聚合，例如 .prod 使用 datetime64[ns] 数据类型 (GH41342 )
窃听 DataFrameGroupBy 当没有有效列时，聚合错误地无法删除具有该聚合的无效数据类型的列 (GH41291 )
窃听 DataFrame.rolling.__iter__() 哪里 on 未分配给结果对象的索引 (GH40373 )
窃听 DataFrameGroupBy.transform() 和 DataFrameGroupBy.agg() 使用 engine="numba" 哪里 *args 正在使用用户传递的函数进行缓存 (GH41647 )
Bug in DataFrameGroupBy methods agg, transform, sum, bfill, ffill, pad, pct_change, shift, ohlc dropping .columns.names (GH41497)

重塑#

窃听 merge() 使用部分索引执行内连接时引发错误 right_index=True 当索引之间没有重叠时 (GH33814 )
窃听 DataFrame.unstack() 缺少级别会导致索引名称不正确 (GH37510 )
窃听 merge_asof() 通过以下方式传播正确的索引 left_index=True 和 right_on 规格而不是左索引 (GH33463 )
窃听 DataFrame.join() 在具有 MultiIndex 当两个索引中的一个只有一个级别时返回错误结果 (GH36909 )
merge_asof() 现在引发一个 ValueError 而不是神秘的 TypeError 在非数字合并列的情况下 (GH29130 )
窃听 DataFrame.join() 当DataFrame具有 MultiIndex 其中至少有一个维度具有数据类型 Categorical 具有未按字母顺序排序的类别 (GH38502 )
Series.value_counts() 和 Series.mode() 现在以原始顺序返回一致的键 (GH12679 ， GH11227 和 GH39007 )
窃听 DataFrame.stack() 未处理 NaN 在……里面 MultiIndex 正确的列 (GH39481 )
窃听 DataFrame.apply() 会给出不正确的结果 func 是一根线， axis=1 ，并且不支持轴参数；现在引发 ValueError 取而代之的是 (GH39211 )
Bug in DataFrame.sort_values() not reshaping the index correctly after sorting on columns when ignore_index=True (GH39464)
窃听 DataFrame.append() 返回具有以下组合的不正确数据类型 ExtensionDtype 数据类型 (GH39454 )
窃听 DataFrame.append() 与组合使用时返回不正确的数据类型 datetime64 和 timedelta64 数据类型 (GH39574 )
Bug in DataFrame.append() with a DataFrame with a MultiIndex and appending a Series whose Index is not a MultiIndex (GH41707)
窃听 DataFrame.pivot_table() 返回一个 MultiIndex 在空DataFrame上操作时获取单个值 (GH13483 )
Index 现在可以传递给 numpy.all() 功能 (GH40180 )
Bug in DataFrame.stack() not preserving CategoricalDtype in a MultiIndex (GH36991)
窃听 to_datetime() 当输入序列包含不可散列的项时引发错误 (GH39756 )
窃听 Series.explode() 在以下情况下保留索引 ignore_index 曾经是 True 并且值是标量 (GH40487 )
窃听 to_datetime() 提高一名 ValueError 什么时候 Series 包含 None 和 NaT 有50多种元素 (GH39882 )
Bug in Series.unstack() and DataFrame.unstack() with object-dtype values containing timezone-aware datetime objects incorrectly raising TypeError (GH41875)
Bug in DataFrame.melt() raising InvalidIndexError when DataFrame has duplicate columns used as value_vars (GH41951)

稀疏#

Bug in DataFrame.sparse.to_coo() raising a KeyError with columns that are a numeric Index without a 0 (GH18414)
窃听 SparseArray.astype() 使用 copy=False 从整型数据类型转换为浮点型数据类型时产生错误的结果 (GH34456 )
窃听 SparseArray.max() 和 SparseArray.min() 将始终返回空结果 (GH40921 )

ExtensionArray#

Bug in DataFrame.where() when other is a Series with an ExtensionDtype (GH38729)
Fixed bug where Series.idxmax(), Series.idxmin(), Series.argmax(), and Series.argmin() would fail when the underlying data is an ExtensionArray (GH32749, GH33719, GH36566)
修复了某些子类的某些属性 PandasExtensionDtype 缓存不正确的位置 (GH40329 )
Bug in DataFrame.mask() where masking a DataFrame with an ExtensionDtype raises a ValueError (GH40941)

造型师#

窃听 Styler 其中 subset 对于某些有效的多索引切片，方法中的参数引发错误 (GH33562 )
Styler 呈现的HTML输出进行了细微更改，以支持w3良好的代码标准 (GH39626 )
窃听 Styler 其中呈现的HTML缺少某些标题单元格的列类标识符 (GH39716 )
窃听 Styler.background_gradient() 未正确确定文本颜色的位置 (GH39888 )
窃听 Styler.set_table_styles() 的css-选择器中的多个元素 table_styles 参数未正确添加 (GH34061 )
窃听 Styler 从Jupyter复制时丢弃了左上角单元格和未对齐的标题 (GH12147 )
窃听 Styler.where 哪里 kwargs 未传递给适用的可调用对象 (GH40845 )
窃听 Styler 导致CSS在多个渲染上复制 (GH39395 ， GH40334 )

其他#

inspect.getmembers(Series) no longer raises an AbstractMethodError (GH38782)
Bug in Series.where() with numeric dtype and other=None not casting to nan (GH39761)
窃听 assert_series_equal() ， assert_frame_equal() ， assert_index_equal() 和 assert_extension_array_equal() 当属性具有无法识别的NA类型时错误引发 (GH39461 )
窃听 assert_index_equal() 使用 exact=True 比较时不提高 CategoricalIndex 实例具有 Int64Index 和 RangeIndex 范畴 (GH41263 )
Bug in DataFrame.equals(), Series.equals(), and Index.equals() with object-dtype containing np.datetime64("NaT") or np.timedelta64("NaT") (GH39650)
窃听 show_versions() 控制台JSON输出不是正确的JSON (GH39701 )
pandas can now compile on z/OS when using xlc (GH35826)
Bug in pandas.util.hash_pandas_object() not recognizing hash_key, encoding and categorize when the input object type is a DataFrame (GH41404)

贡献者#

共有251人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

Abhishek R +
Ada Draginda
Adam J. Stewart
Adam Turner +
Aidan Feldman +
Ajitesh Singh +
Akshat Jain +
Albert Villanova del Moral
Alexandre Prince-Levasseur +
Andrew Hawyrluk +
Andrew Wieteska
AnglinaBhambra +
Ankush Dua +
Anna Daglis
Ashlan Parker +
Ashwani +
Avinash Pancham
Ayushman Kumar +
BeanNan
Benoît Vinot
Bharat Raghunathan
Bijay Regmi +
Bobin Mathew +
Bogdan Pilyavets +
Brian Hulette +
Brian Sun +
Brock +
Bryan Cutler
Caleb +
Calvin Ho +
Chathura Widanage +
Chinmay Rane +
Chris Lynch
Chris Withers
Christos Petropoulos
Corentin Girard +
DaPy15 +
Damodara Puddu +
Daniel Hrisca
Daniel Saxton
DanielFEvans
Dare Adewumi +
Dave Willmer
David Schlachter +
David-dmh +
Deepang Raval +
Doris Lee +
Dr. Jan-Philip Gehrcke +
DriesS +
Dylan Percy
Erfan Nariman
Eric Leung
EricLeer +
Eve
Fangchen Li
Felix Divo
Florian Jetter
Fred Reiss
GFJ138 +
Gaurav Sheni +
Geoffrey B. Eisenbarth +
Gesa Stupperich +
Griffin Ansel +
Gustavo C. Maciel +
Heidi +
Henry +
Hung-Yi Wu +
Ian Ozsvald +
Irv Lustig
Isaac Chung +
Isaac Virshup
JHM Darbyshire (MBP) +
JHM Darbyshire (iMac) +
Jack Liu +
James Lamb +
Jeet Parekh
Jeff Reback
Jiezheng2018 +
Jody Klymak
Johan Kåhrström +
John McGuigan
Joris Van den Bossche
Jose
JoseNavy
Josh Dimarsky
Josh Friedlander
Joshua Klein +
Julia Signell
Julian Schnitzler +
Kaiqi Dong
Kasim Panjri +
Katie Smith +
Kelly +
Kenil +
Keppler, Kyle +
Kevin Sheppard
Khor Chean Wei +
Kiley Hewitt +
Larry Wong +
Lightyears +
Lucas Holtz +
Lucas Rodés-Guirao
Lucky Sivagurunathan +
Luis Pinto
Maciej Kos +
Marc Garcia
Marco Edward Gorelli +
Marco Gorelli
MarcoGorelli +
Mark Graham
Martin Dengler +
Martin Grigorov +
Marty Rudolf +
Matt Roeschke
Matthew Roeschke
Matthew Zeitlin
Max Bolingbroke
Maxim Ivanov
Maxim Kupfer +
Mayur +
MeeseeksMachine
Micael Jarniac
Michael Hsieh +
Michel de Ruiter +
Mike Roberts +
Miroslav Šedivý
Mohammad Jafar Mashhadi
Morisa Manzella +
Mortada Mehyar
Muktan +
Naveen Agrawal +
Noah
Nofar Mishraki +
Oleh Kozynets
Olga Matoula +
Oli +
Omar Afifi
Omer Ozarslan +
Owen Lamont +
Ozan Öğreden +
Pandas Development Team
Paolo Lammens
Parfait Gasana +
Patrick Hoefler
Paul McCarthy +
Paulo S. Costa +
Pav A
Peter
Pradyumna Rahul +
Punitvara +
QP Hou +
Rahul Chauhan
Rahul Sathanapalli
Richard Shadrach
Robert Bradshaw
Robin to Roxel
Rohit Gupta
Sam Purkis +
Samuel GIFFARD +
Sean M. Law +
Shahar Naveh +
ShaharNaveh +
Shiv Gupta +
Shrey Dixit +
Shudong Yang +
Simon Boehm +
Simon Hawkins
Sioned Baker +
Stefan Mejlgaard +
Steven Pitman +
Steven Schaerer +
Stéphane Guillou +
TLouf +
Tegar D Pratama +
Terji Petersen
Theodoros Nikolaou +
Thomas Dickson
Thomas Li
Thomas Smith
Thomas Yu +
ThomasBlauthQC +
Tim Hoffmann
Tom Augspurger
Torsten Wörtwein
Tyler Reddy
UrielMaD
Uwe L. Korn
Venaturum +
VirosaLi
Vladimir Podolskiy
Vyom Pathak +
WANG Aiyong
Waltteri Koskinen +
Wenjun Si +
William Ayd
Yeshwanth N +
Yuanhao Geng
Zito Relova +
aflah02 +
arredond +
attack68
cdknox +
chinggg +
fathomer +
ftrihardjo +
github-actions[bot] +
gunjan-solanki +
guru kiran
hasan-yaman
i-aki-y +
jbrockmendel
jmholzer +
jordi-crespo +
jotasi +
jreback
juliansmidek +
kylekeppler
lrepiton +
lucasrodes
maroth96 +
mikeronayne +
mlondschien
moink +
morrme
mschmookler +
mzeitlin11
na2 +
nofarmishraki +
partev
patrick
ptype
realead
rhshadrach
rlukevie +
rosagold +
saucoide +
sdementen +
shawnbrown
sstiijn +
stphnlyd +
sukriti1 +
taytzehao
theOehrly +
theodorju +
thordisstella +
tonyyyyip +
tsinggggg +
tushushu +
vangorade +
vladu +
wertha +

1.3.1中的新特性(2021年7月25日)

1.2.5中的新特性(2021年6月22日)

1.3.0中的新特性(2021年7月2日)#

增强#

读取CSV或JSON文件时的自定义HTTP标头#

读写XML文档#

Styler增强功能#

DataFrame构造函数支持 copy=False 使用DICT#

PyArrow支持的字符串数据类型#

居中的日期时间样式的滚动窗口#

其他增强功能#

值得注意的错误修复#

Categorical.unique 现在始终保持与原始数据类型相同的数据类型#

将数据类型保留在 DataFrame.combine_first()#

Groupby方法agg和Transform不再更改可调用对象的返回数据类型#

float result for GroupBy.mean(), GroupBy.median(), and GroupBy.var()#

使用设置值时尝试就地操作 loc 和 iloc#

设置时切勿原地操作 frame[keys] = values#

设置为布尔级数的一致性投射#

GroupBy.Rolling不再返回值中的分组依据列#

去除了滚动方差和标准差中的人为截断#

使用多索引滚动的GroupBy.不再删除结果中的级别#

向后不兼容的API更改#

提高了依赖项的最低版本#

其他API更改#

建房#

不推荐使用#

不建议在DataFrame Reductions和DataFrameGroupBy操作中删除讨厌的列#

其他不推荐使用的词#

性能改进#

错误修复#

直截了当的#

类似日期的#

Timedelta#

时区#

数字#

转换#

字符串#

间隔#

标引#

丢失#

MultiIndex#

I/O#

期间#

标绘#

分组/重采样/滚动#

重塑#

稀疏#

ExtensionArray#

造型师#

其他#

贡献者#

DataFrame构造函数支持 `copy=False` 使用DICT#

`Categorical.unique` 现在始终保持与原始数据类型相同的数据类型#

将数据类型保留在 `DataFrame.combine_first()`#

`float` result for `GroupBy.mean()`, `GroupBy.median()`, and `GroupBy.var()`#

使用设置值时尝试就地操作 `loc` 和 `iloc`#

设置时切勿原地操作 `frame[keys] = values`#