0.24.0中的新特性(2019年1月25日)#

警告

0.24.x系列版本将是最后一个支持Python2的版本。未来的功能版本将仅支持Python3。看见 Dropping Python 2.7 了解更多详细信息。

这是从0.23.4发布的一个主要版本，包括许多API更改、新功能、增强功能和性能改进，以及大量的错误修复。

亮点包括：

Optional Integer NA Support
New APIs for accessing the array backing a Series or Index
A new top-level method for creating arrays
Store Interval and Period data in a Series or DataFrame
Support for joining on two MultiIndexes

检查 API Changes 和 deprecations 在更新之前。

这些是Pandas0.24.0的变化。看见发行说明获取完整的更改日志，包括其他版本的Pandas。

增强#

可选的整型NA支持#

Pandas已经获得了保存缺少值的整型数据类型的能力。这一要求已久的功能是通过使用 extension types 。

备注

整型数组目前还处于实验阶段。其API或实现可能会在没有警告的情况下发生更改。

我们可以构建一个 Series 具有指定数据类型的。数据类型字符串 Int64 是一只Pandas吗 ExtensionDtype 。使用传统的缺少值标记 np.nan 将推断为整型数据类型。屏幕上显示 Series 还将使用 NaN 以指示字符串输出中缺少的值。 (GH20700 ， GH20747 ， GH22441 ， GH21789 ， GH22346 )

In [1]: s = pd.Series([1, 2, np.nan], dtype='Int64')

In [2]: s
Out[2]: 
0       1
1       2
2    <NA>
Length: 3, dtype: Int64

对这些数据类型的操作将传播 NaN 和其他大Pandas的行动一样。

# arithmetic
In [3]: s + 1
Out[3]: 
0       2
1       3
2    <NA>
Length: 3, dtype: Int64

# comparison
In [4]: s == 1
Out[4]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean

# indexing
In [5]: s.iloc[1:3]
Out[5]: 
1       2
2    <NA>
Length: 2, dtype: Int64

# operate with other dtypes
In [6]: s + s.iloc[1:3].astype('Int8')
Out[6]: 
0    <NA>
1       4
2    <NA>
Length: 3, dtype: Int64

# coerce when needed
In [7]: s + 0.01
Out[7]: 
0    1.01
1    2.01
2    <NA>
Length: 3, dtype: Float64

这些数据类型可以作为 DataFrame 。

In [8]: df = pd.DataFrame({'A': s, 'B': [1, 1, 3], 'C': list('aab')})

In [9]: df
Out[9]: 
      A  B  C
0     1  1  a
1     2  1  a
2  <NA>  3  b

[3 rows x 3 columns]

In [10]: df.dtypes
Out[10]: 
A     Int64
B     int64
C    object
Length: 3, dtype: object

这些数据类型可以合并、重塑和强制转换。

In [11]: pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
Out[11]: 
A     Int64
B     int64
C    object
Length: 3, dtype: object

In [12]: df['A'].astype(float)
Out[12]: 
0    1.0
1    2.0
2    NaN
Name: A, Length: 3, dtype: float64

约简和分组操作，如 sum 工作。

In [13]: df.sum()
Out[13]: 
A      3
B      5
C    aab
Length: 3, dtype: object

In [14]: df.groupby('B').A.sum()
Out[14]: 
B
1    3
3    0
Name: A, Length: 2, dtype: Int64

警告

Integer NA支持当前使用大写的数据类型版本，例如 Int8 与传统的 int8 。这可能会在将来的某个日期更改。

看见可为空的整型数据类型想要更多。

访问序列或索引中的值#

Series.array 和 Index.array 已添加用于提取支持 Series 或 Index 。 (GH19954 ， GH23623 )

In [15]: idx = pd.period_range('2000', periods=4)

In [16]: idx.array
Out[16]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

In [17]: pd.Series(idx).array
Out[17]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]

从历史上看，这应该是通过 series.values ，但使用 .values 目前还不清楚返回值是实际的数组、它的某种转换，还是Pandas自定义数组(如 Categorical )。例如，使用 PeriodIndex ， .values 每次生成周期对象的新ndarray。

In [18]: idx.values
Out[18]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

In [19]: id(idx.values)
Out[19]: 139697158492240

In [20]: id(idx.values)
Out[20]: 139697160617584

如果需要实际的NumPy数组，请使用 Series.to_numpy() 或 Index.to_numpy() 。

In [21]: idx.to_numpy()
Out[21]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

In [22]: pd.Series(idx).to_numpy()
Out[22]: 
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

对于由普通NumPy数组支持的系列和索引， Series.array 将返回一个新的 arrays.PandasArray ，它是一个薄的(无复制的)包装 numpy.ndarray 。 PandasArray 本身并不是特别有用，但它确实提供了与Pandas或第三方库中定义的任何扩展数组相同的接口。

In [23]: ser = pd.Series([1, 2, 3])

In [24]: ser.array
Out[24]: 
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64

In [25]: ser.to_numpy()
Out[25]: array([1, 2, 3])

我们尚未删除或弃用 Series.values 或 DataFrame.values ，但我们强烈推荐并使用 .array 或 .to_numpy() 取而代之的是。

看见 Dtypes 和 Attributes and Underlying Data 想要更多。

`pandas.array` ：创建数组的新顶层方法#

一种新的顶层方法 array() 添加了用于创建一维数组的 (GH22860 )。这可用于创建任何 extension array ，包括由注册的扩展数组 3rd party libraries 。请参阅 dtypes docs 有关扩展阵列的更多信息。

In [26]: pd.array([1, 2, np.nan], dtype='Int64')
Out[26]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

In [27]: pd.array(['a', 'b', 'c'], dtype='category')
Out[27]: 
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']

传递没有专用扩展类型的数据(如浮点型、整型等)将返回一个新的 arrays.PandasArray ，它只是一个薄的(无复制的)包装 numpy.ndarray 这满足了Pandas扩展阵列接口。

In [28]: pd.array([1, 2, 3])
Out[28]: 
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

就他们自己而言，一个 PandasArray 不是一个很有用的物体。但是，如果您需要编写低级代码，该代码一般适用于 ExtensionArray ， PandasArray 满足了这一需求。

请注意，默认情况下，如果没有 dtype 则从数据推断返回数组的数据类型。特别是，请注意，第一个示例 [1, 2, np.nan] 将返回一个浮点数组，因为 NaN 是一个浮点。

In [29]: pd.array([1, 2, np.nan])
Out[29]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

在Series和DataFrame中存储间隔和周期数据#

Interval 和 Period 数据现在可以存储在 Series 或 DataFrame ，除了 IntervalIndex 和 PeriodIndex 像以前一样 (GH19453 ， GH22862 )。

In [30]: ser = pd.Series(pd.interval_range(0, 5))

In [31]: ser
Out[31]: 
0    [0, 1]
1    [1, 2]
2    [2, 3]
3    [3, 4]
4    [4, 5]
Length: 5, dtype: interval

In [32]: ser.dtype
Out[32]: interval[int64, both]

对于期间：

In [33]: pser = pd.Series(pd.period_range("2000", freq="D", periods=5))

In [34]: pser
Out[34]: 
0    2000-01-01
1    2000-01-02
2    2000-01-03
3    2000-01-04
4    2000-01-05
Length: 5, dtype: period[D]

In [35]: pser.dtype
Out[35]: period[D]

以前，它们将被强制转换为具有对象dtype的NumPy数组。通常，在将时间间隔或周期数组存储在 Series 或列中的 DataFrame 。

使用 Series.array 方法提取基础间隔数组或期间数组 Series ：

In [36]: ser.array
Out[36]: 
<IntervalArray>
[[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]]
Length: 5, dtype: interval[int64, both]

In [37]: pser.array
Out[37]: 
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04', '2000-01-05']
Length: 5, dtype: period[D]

这些函数返回一个 arrays.IntervalArray 或 arrays.PeriodArray ，新的扩展数组支持间隔和期间数据。

警告

为了向后兼容， Series.values 继续返回间隔和期间数据的对象的NumPy数组。我们建议您使用 Series.array 当您需要存储在 Series ，以及 Series.to_numpy() 当您知道需要一个NumPy数组时。

看见 Dtypes 和 Attributes and Underlying Data 想要更多。

使用两个多索引连接#

DataFrame.merge() 和 DataFrame.join() 现在可用于联接多索引 Dataframe 重叠索引级别上的实例 (GH6360 )

请参阅 Merge, join, and concatenate 文档部分。

In [38]: index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
   ....:                                        ('K1', 'X2')],
   ....:                                        names=['key', 'X'])
   ....: 

In [39]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
   ....:                      'B': ['B0', 'B1', 'B2']}, index=index_left)
   ....: 

In [40]: index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
   ....:                                         ('K2', 'Y2'), ('K2', 'Y3')],
   ....:                                         names=['key', 'Y'])
   ....: 

In [41]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
   ....:                       'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right)
   ....: 

In [42]: left.join(right)
Out[42]: 
            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

对于较早的版本，可以使用以下命令完成此操作。

In [43]: pd.merge(left.reset_index(), right.reset_index(),
   ....:          on=['key'], how='inner').set_index(['key', 'X', 'Y'])
   ....: 
Out[43]: 
            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

功能 `read_html` 增强功能#

read_html() 以前被忽略 colspan 和 rowspan 属性。现在，它理解了它们，将它们视为具有相同价值的细胞序列。 (GH17054 )

In [44]: result = pd.read_html("""
   ....:   <table>
   ....:     <thead>
   ....:       <tr>
   ....:         <th>A</th><th>B</th><th>C</th>
   ....:       </tr>
   ....:     </thead>
   ....:     <tbody>
   ....:       <tr>
   ....:         <td colspan="2">1</td><td>2</td>
   ....:       </tr>
   ....:     </tbody>
   ....:   </table>""")
   ....: 

以前的行为 ：

In [13]: result
Out [13]:
[   A  B   C
 0  1  2 NaN]

新行为 ：

In [45]: result
Out[45]: 
[   A  B  C
 0  1  1  2
 
 [1 rows x 3 columns]]

新的 `Styler.pipe()` 方法#

这个 Styler 班级获得了一个 pipe() 方法。这为应用用户预定义的样式函数提供了一种方便的方法，并有助于减少在笔记本中重复使用DataFrame样式功能时的“样板”。 (GH23229 )

In [46]: df = pd.DataFrame({'N': [1250, 1500, 1750], 'X': [0.25, 0.35, 0.50]})

In [47]: def format_and_align(styler):
   ....:     return (styler.format({'N': '{:,}', 'X': '{:.1%}'})
   ....:                   .set_properties(**{'text-align': 'right'}))
   ....: 

In [48]: df.style.pipe(format_and_align).set_caption('Summary of results.')
Out[48]: <pandas.io.formats.style.Styler at 0x7f0dee277730>

类似的方法已经存在于大Pandas的其他类别中，包括 DataFrame.pipe() ， GroupBy.pipe() ，以及 Resampler.pipe() 。

重命名多索引中的名称#

DataFrame.rename_axis() 现在支持 index 和 columns 论据和 Series.rename_axis() 支架 index 论据 (GH19978 )。

此更改允许传递词典，以便 MultiIndex 是可以改变的。

示例：

In [49]: mi = pd.MultiIndex.from_product([list('AB'), list('CD'), list('EF')],
   ....:                                 names=['AB', 'CD', 'EF'])
   ....: 

In [50]: df = pd.DataFrame(list(range(len(mi))), index=mi, columns=['N'])

In [51]: df
Out[51]: 
          N
AB CD EF   
A  C  E   0
      F   1
   D  E   2
      F   3
B  C  E   4
      F   5
   D  E   6
      F   7

[8 rows x 1 columns]

In [52]: df.rename_axis(index={'CD': 'New'})
Out[52]: 
           N
AB New EF   
A  C   E   0
       F   1
   D   E   2
       F   3
B  C   E   4
       F   5
   D   E   6
       F   7

[8 rows x 1 columns]

请参阅 Advanced documentation on renaming 了解更多详细信息。

其他增强功能#

merge() 现在直接允许在类型为 DataFrame 并命名为 Series ，而不需要将 Series 对象转换为一个 DataFrame 事先 (GH21220 )
ExcelWriter 现在接受 mode 作为关键字参数，在使用 openpyxl 发动机 (GH3441 )
FrozenList 已经获得了 .union() 和 .difference() 方法：研究方法。此功能极大地简化了依赖于显式排除某些列的GROUPBY。看见 Splitting an object into groups 了解更多信息 (GH15475 ， GH15506 )。
DataFrame.to_parquet() 现在接受 index 作为参数，允许用户覆盖引擎的默认行为，以在生成的Parquet文件中包含或省略DataFrame的索引。 (GH20768 )
read_feather() 现在接受 columns 作为参数，允许用户指定应该读取哪些列。 (GH24025 )
DataFrame.corr() 和 Series.corr() 现在接受一个可调用的相关性通用计算方法，例如直方图相交 (GH22684 )
DataFrame.to_string() 现在接受 decimal 作为参数，允许用户指定应在输出中使用哪个小数分隔符。 (GH23614 )
DataFrame.to_html() 现在接受 render_links 作为参数，允许用户生成具有指向DataFrame中出现的任何URL的链接的HTML。请参阅 section on writing HTML 在IO文档中，例如用法。 (GH2679 )
pandas.read_csv() 现在支持将Pandas扩展类型作为参数 dtype ，允许用户在阅读CSV时使用Pandas扩展类型。 (GH23228 )
这个 shift() 方法现在接受 fill_value 作为参数，允许用户指定一个值，该值将在空时段中用来代替NA/NAT。 (GH15486 )
to_datetime() now supports the %Z and %z directive when passed into format (GH13486)
Series.mode() 和 DataFrame.mode() 现在支持 dropna parameter which can be used to specify whether NaN/NaT 应考虑价值观 (GH17534 )
DataFrame.to_csv() 和 Series.to_csv() 现在支持 compression 关键字。 (GH21227 )
Index.droplevel() is now implemented also for flat indexes, for compatibility with MultiIndex (GH21115)
Series.droplevel() 和 DataFrame.droplevel() 现已实施 (GH20342 )
添加了对Google云存储的读/写支持 gcsfs 类库 (GH19454 ， GH23094 )
DataFrame.to_gbq() 和 read_gbq() 签名和文档已更新，以反映 pandas-gbq library version 0.8.0 。添加一个 credentials argument, which enables the use of any kind of google-auth credentials 。 (GH21627 ， GH22557 ， GH23662 )
新方法 HDFStore.walk() 将递归遍历HDF5文件的组层次结构 (GH10932 )
read_html() copies cell data across colspan and rowspan, and it treats all-th table rows as headers if header kwarg is not given and there is no thead (GH17054)
Series.nlargest() ， Series.nsmallest() ， DataFrame.nlargest() ，以及 DataFrame.nsmallest() 现在接受该值 "all" 对于 keep 争论。这将保持第n个最大/最小值的所有关系 (GH16818 )
IntervalIndex 已经获得了 set_closed() 方法来更改现有的 closed 价值 (GH21670 )
to_csv(), to_csv(), to_json(), and to_json() now support compression='infer' to infer compression based on filename extension (GH15008). The default compression for to_csv, to_json, and to_pickle methods has been updated to 'infer' (GH22004).
DataFrame.to_sql() 现在支持写入 TIMESTAMP WITH TIME ZONE 支持的数据库的类型。对于不支持时区的数据库，DateTime数据将存储为不知道本地时间戳的时区。请参阅 DateTime数据类型了解其影响 (GH9086 )。
to_timedelta() 现在支持iso格式的时间增量字符串 (GH21877 )
Series 和 DataFrame 现在支持 Iterable 构造函数中的对象 (GH2193 )
DatetimeIndex 已经获得了 DatetimeIndex.timetz 属性。这将返回包含时区信息的本地时间。 (GH21358 )
round(), ceil(), and floor() for DatetimeIndex and Timestamp now support an ambiguous argument for handling datetimes that are rounded to ambiguous times (GH18946) and a nonexistent argument for handling datetimes that are rounded to nonexistent times. See 本地化时不存在时间 (GH22647)
The result of resample() is now iterable similar to groupby() (GH15314).
Series.resample() and DataFrame.resample() have gained the pandas.core.resample.Resampler.quantile() (GH15023).
DataFrame.resample() 和 Series.resample() 使用一个 PeriodIndex 现在将尊重 base 参数的方式与使用 DatetimeIndex 。 (GH23882 )
pandas.api.types.is_list_like() 获得了一个关键字 allow_sets 这就是 True 默认情况下；如果 False ，所有实例 set 将不再被认为是“清单状” (GH23061 )
Index.to_frame() 现在支持覆盖列名 (GH22580 )。
Categorical.from_codes() now can take a dtype parameter as an alternative to passing categories and ordered (GH24398).
新属性 __git_version__ 将返回当前版本的Git Commit SHA (GH21295 )。
与Matplotlib 3.0兼容 (GH22790 )。
已添加 Interval.overlaps() ， arrays.IntervalArray.overlaps() ，以及 IntervalIndex.overlaps() 用于确定间隔类对象之间的重叠 (GH21998 )
read_fwf() now accepts keyword infer_nrows (GH15138).
to_parquet() now supports writing a DataFrame as a directory of parquet files partitioned by a subset of the columns when engine = 'pyarrow' (GH23283)
Timestamp.tz_localize(), DatetimeIndex.tz_localize(), and Series.tz_localize() have gained the nonexistent argument for alternative handling of nonexistent times. See 本地化时不存在时间 (GH8917, GH24466)
Index.difference() ， Index.intersection() ， Index.union() ，以及 Index.symmetric_difference() 现在有一个可选的 sort 参数来控制是否应在可能的情况下对结果进行排序 (GH17839 ， GH24471 )
read_excel() 现在接受 usecols 作为列名的列表或可调用的 (GH18273 )
MultiIndex.to_flat_index() 已添加以将多个标高展平为单标高 Index 对象。
DataFrame.to_stata() 和 pandas.io.stata.StataWriter117 可以将混合字符串列写入Stata strl格式 (GH23633 )
DataFrame.between_time() 和 DataFrame.at_time() 已经获得了 axis 参数 (GH8839 )
DataFrame.to_records() 现在接受 index_dtypes 和 column_dtypes 允许在存储的列和索引记录中使用不同数据类型的参数 (GH18146 )
IntervalIndex 已经获得了 is_overlapping 属性以指示是否 IntervalIndex 包含任何重叠的间隔 (GH23309 )
pandas.DataFrame.to_sql() 已经获得了 method 用于控制SQL INSERT子句的参数。请参阅 insertion method 部分，请参阅文档。 (GH8953 )
DataFrame.corrwith() 现在支持Spearman的等级相关、Kendall的tau以及可调用的相关方法。 (GH21925 )
DataFrame.to_json() ， DataFrame.to_csv() ， DataFrame.to_pickle() ，和其他导出方法现在支持路径参数中的波浪号(~)。 (GH23473 )

向后不兼容的API更改#

Pandas 0.24.0包含了许多API突破性变化。

提高了依赖项的最低版本#

我们已经更新了依赖项的最低支持版本 (GH21242 ， GH18742 ， GH23774 ， GH24767 )。如果已安装，我们现在需要：

套餐	最低版本	必填项
钱币	1.12.0	X
瓶颈	1.2.0
实木地板	0.2.1
Matplotlib	2.0.0
数字快递	2.6.1
Pandas-Gbq	0.8.0
绿箭侠	0.9.0
易燃物	3.4.2
斯比	0.18.1
Xlrd	1.0.0
最热(Dev)	3.6

Additionally we no longer depend on feather-format for feather based storage and replaced it with references to pyarrow (GH21639 and GH23053).

`os.linesep` is used for `line_terminator` of `DataFrame.to_csv`#

DataFrame.to_csv() 现在使用 os.linesep() 而不是 '\n' 对于默认的行终止符 (GH20353 )。此更改仅影响在Windows上运行时， '\r\n' 被用作行终止符，即使在 '\n' 是传入的 line_terminator 。

以前的行为 在Windows上：

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: # When passing file PATH to to_csv,
   ...: # line_terminator does not work, and csv is saved with '\r\n'.
   ...: # Also, this converts all '\n's in the data to '\r\n'.
   ...: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'

In [4]: # When passing file OBJECT with newline option to
   ...: # to_csv, line_terminator works.
   ...: with open("test2.csv", mode='w', newline='\n') as f:
   ...:     data.to_csv(f, index=False, line_terminator='\n')

In [5]: with open("test2.csv", mode='rb') as f:
   ...:     print(f.read())
Out[5]: b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'

新行为 在Windows上：

通过 line_terminator 显式设置 line terminator 对那个角色来说。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'

在Windows上， os.linesep 是 '\r\n' ，所以如果 line_terminator 未设置， '\r\n' 用于行终止符。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: data.to_csv("test.csv", index=False)

In [3]: with open("test.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

对于文件对象，指定 newline 不足以设置行终止符。您必须传入 line_terminator 明确地说，即使是在这种情况下。

In [1]: data = pd.DataFrame({"string_with_lf": ["a\nbc"],
   ...:                      "string_with_crlf": ["a\r\nbc"]})

In [2]: with open("test2.csv", mode='w', newline='\n') as f:
   ...:     data.to_csv(f, index=False)

In [3]: with open("test2.csv", mode='rb') as f:
   ...:     print(f.read())
Out[3]: b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

妥善处理 `np.NaN` 在具有Python引擎的字符串数据类型的列中#

有窃听器在里面 read_excel() 和 read_csv() 使用Python引擎，其中缺少的值将变为 'nan' 使用 dtype=str 和 na_filter=True 。现在，这些缺失的值被转换为字符串缺失指示符， np.nan 。 (GH20377 )

以前的行为 ：

In [5]: data = 'a,b,c\n1,,3\n4,5,6'
In [6]: df = pd.read_csv(StringIO(data), engine='python', dtype=str, na_filter=True)
In [7]: df.loc[0, 'b']
Out[7]:
'nan'

新行为 ：

In [53]: data = 'a,b,c\n1,,3\n4,5,6'

In [54]: df = pd.read_csv(StringIO(data), engine='python', dtype=str, na_filter=True)

In [55]: df.loc[0, 'b']
Out[55]: nan

请注意我们现在是如何输出 np.nan 而不是它的一种串行化形式。

解析带有时区偏移量的日期时间字符串#

以前，使用以下命令解析具有UTC偏移量的日期时间字符串 to_datetime() 或 DatetimeIndex 将自动将DateTime转换为UTC，而无需进行时区本地化。这与使用解析相同的日期时间字符串不一致 Timestamp 中保留UTC偏移量 tz 属性。现在, to_datetime() 中保留UTC偏移量 tz 当所有日期时间字符串具有相同的UTC偏移量时， (GH17697 ， GH11736 ， GH22457 )

以前的行为 ：

In [2]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[2]: Timestamp('2015-11-18 10:00:00')

In [3]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[3]: Timestamp('2015-11-18 15:30:00+0530', tz='pytz.FixedOffset(330)')

# Different UTC offsets would automatically convert the datetimes to UTC (without a UTC timezone)
In [4]: pd.to_datetime(["2015-11-18 15:30:00+05:30", "2015-11-18 16:30:00+06:30"])
Out[4]: DatetimeIndex(['2015-11-18 10:00:00', '2015-11-18 10:00:00'], dtype='datetime64[ns]', freq=None)

新行为 ：

In [56]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[56]: Timestamp('2015-11-18 15:30:00+0530', tz='pytz.FixedOffset(330)')

In [57]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[57]: Timestamp('2015-11-18 15:30:00+0530', tz='pytz.FixedOffset(330)')

解析具有相同UTC偏移量的日期时间字符串将在 tz

In [58]: pd.to_datetime(["2015-11-18 15:30:00+05:30"] * 2)
Out[58]: DatetimeIndex(['2015-11-18 15:30:00+05:30', '2015-11-18 15:30:00+05:30'], dtype='datetime64[ns, pytz.FixedOffset(330)]', freq=None)

解析具有不同UTC偏移量的日期时间字符串现在将创建索引 datetime.datetime 具有不同UTC偏移的对象

In [59]: idx = pd.to_datetime(["2015-11-18 15:30:00+05:30",
   ....:                       "2015-11-18 16:30:00+06:30"])
   ....: 

In [60]: idx
Out[60]: Index([2015-11-18 15:30:00+05:30, 2015-11-18 16:30:00+06:30], dtype='object')

In [61]: idx[0]
Out[61]: datetime.datetime(2015, 11, 18, 15, 30, tzinfo=tzoffset(None, 19800))

In [62]: idx[1]
Out[62]: datetime.datetime(2015, 11, 18, 16, 30, tzinfo=tzoffset(None, 23400))

通过 utc=True 将模仿前面的行为，但将正确地指示日期已转换为UTC

In [63]: pd.to_datetime(["2015-11-18 15:30:00+05:30",
   ....:                 "2015-11-18 16:30:00+06:30"], utc=True)
   ....: 
Out[63]: DatetimeIndex(['2015-11-18 10:00:00+00:00', '2015-11-18 10:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)

使用以下命令解析混合时区 `read_csv()`#

read_csv() 不再将混合时区列静默转换为UTC (GH24987 )。

以前的行为

>>> import io
>>> content = """\
... a
... 2000-01-01T00:00:00+05:00
... 2000-01-01T00:00:00+06:00"""
>>> df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
>>> df.a
0   1999-12-31 19:00:00
1   1999-12-31 18:00:00
Name: a, dtype: datetime64[ns]

新行为

In [64]: import io

In [65]: content = """\
   ....: a
   ....: 2000-01-01T00:00:00+05:00
   ....: 2000-01-01T00:00:00+06:00"""
   ....: 

In [66]: df = pd.read_csv(io.StringIO(content), parse_dates=['a'])

In [67]: df.a
Out[67]: 
0    2000-01-01 00:00:00+05:00
1    2000-01-01 00:00:00+06:00
Name: a, Length: 2, dtype: object

可以看出， dtype 是对象；列中的每个值都是一个字符串。若要将字符串转换为日期时间数组， date_parser 论据

In [68]: df = pd.read_csv(io.StringIO(content), parse_dates=['a'],
   ....:                  date_parser=lambda col: pd.to_datetime(col, utc=True))
   ....: 

In [69]: df.a
Out[69]: 
0   1999-12-31 19:00:00+00:00
1   1999-12-31 18:00:00+00:00
Name: a, Length: 2, dtype: datetime64[ns, UTC]

看见解析带有时区偏移量的日期时间字符串想要更多。

时间值，单位： `dt.end_time` 和 `to_timestamp(how='end')`#

The time values in Period and PeriodIndex objects are now set to '23:59:59.999999999' when calling Series.dt.end_time, Period.end_time, PeriodIndex.end_time, Period.to_timestamp() with how='end', or PeriodIndex.to_timestamp() with how='end' (GH17157)

以前的行为 ：

In [2]: p = pd.Period('2017-01-01', 'D')
In [3]: pi = pd.PeriodIndex([p])

In [4]: pd.Series(pi).dt.end_time[0]
Out[4]: Timestamp(2017-01-01 00:00:00)

In [5]: p.end_time
Out[5]: Timestamp(2017-01-01 23:59:59.999999999)

新行为 ：

呼叫 Series.dt.end_time 现在将导致时间为‘23：59：59.999999999’，与 Period.end_time 例如，

In [70]: p = pd.Period('2017-01-01', 'D')

In [71]: pi = pd.PeriodIndex([p])

In [72]: pd.Series(pi).dt.end_time[0]
Out[72]: Timestamp('2017-01-01 23:59:59.999999999')

In [73]: p.end_time
Out[73]: Timestamp('2017-01-01 23:59:59.999999999')

系列。对于支持时区的数据是唯一的#

The return type of Series.unique() for datetime with timezone values has changed from an numpy.ndarray of Timestamp objects to a arrays.DatetimeArray (GH24024).

In [74]: ser = pd.Series([pd.Timestamp('2000', tz='UTC'),
   ....:                  pd.Timestamp('2000', tz='UTC')])
   ....: 

以前的行为 ：

In [3]: ser.unique()
Out[3]: array([Timestamp('2000-01-01 00:00:00+0000', tz='UTC')], dtype=object)

新行为 ：

In [75]: ser.unique()
Out[75]: 
<DatetimeArray>
['2000-01-01 00:00:00+00:00']
Length: 1, dtype: datetime64[ns, UTC]

稀疏数据结构重构#

SparseArray ，数组后备 SparseSeries 中的列，而 SparseDataFrame ，现在是扩展数组 (GH21978 ， GH19056 ， GH22835 )。为了符合这个界面并与其他大Pandas保持一致，对API做了一些突破性的改变：

SparseArray 不再是的子类 numpy.ndarray 。要转换为 SparseArray 添加到NumPy数组，请使用 numpy.asarray() 。
SparseArray.dtype 和 SparseSeries.dtype 是现在的实例 SparseDtype ，而不是 np.dtype 。使用访问基础数据类型 SparseDtype.subtype 。
numpy.asarray(sparse_array) 现在返回一个包含所有值的密集数组，而不仅仅是非填充值 (GH14167 )
SparseArray.take now matches the API of pandas.api.extensions.ExtensionArray.take() (GH19506):
- 的默认值 allow_fill 已从 False 至 True 。
- 这个 out 和 mode 现在不再接受参数(以前，如果指定了参数，则会引发此问题)。
- 将标量传递给 indices 不再被允许。
结果是 concat() 具有稀疏和密集混合的级数是具有稀疏值的级数，而不是 SparseSeries 。
SparseDataFrame.combine 和 DataFrame.combine_first 不再支持将稀疏列与密集列组合在一起，同时保留稀疏子类型。结果将是一个对象dtype Sparse数组。
设置 SparseArray.fill_value 现在允许设置为具有不同数据类型的填充值。
DataFrame[column] 现在是一个 Series 使用稀疏值，而不是 SparseSeries ，当对具有稀疏值的单个列进行切片时 (GH23559 )。
结果是 Series.where() 现在是一个 Series 使用稀疏值，就像其他扩展数组一样 (GH24077 )

对于需要或可能实现大型密集阵列的操作，会发出一些新的警告：

A errors.PerformanceWarning 在将FILLNA与 method ，因为构建密集数组以创建填充数组。填充了一个 value 是填充稀疏数组的有效方法。
A errors.PerformanceWarning 现在，在连接具有不同填充值的稀疏系列时发出。继续使用第一个稀疏数组的填充值。

除了这些API突破性更改之外，还有许多 Performance Improvements and Bug Fixes have been made 。

最后，一个 Series.sparse 添加访问器以提供特定于稀疏的方法，如 Series.sparse.from_coo() 。

In [76]: s = pd.Series([0, 0, 1, 1, 1], dtype='Sparse[int]')

In [77]: s.sparse.density
Out[77]: 0.6

`get_dummies()` 始终返回DataFrame#

以前，当 sparse=True 已传递给 get_dummies() ，则返回值可以是 DataFrame 或者是 SparseDataFrame 取决于是对所有列还是仅对列的子集进行了伪编码。现在，一个 DataFrame 总是返回 (GH24284 )。

以前的行为

第一 get_dummies() 返回一个 DataFrame 因为这个栏目 A 不是虚拟编码的。当刚刚 ["B", "C"] 被传递到 get_dummies ，则所有列都是虚拟编码的，并且 SparseDataFrame 被退回了。

In [2]: df = pd.DataFrame({"A": [1, 2], "B": ['a', 'b'], "C": ['a', 'a']})

In [3]: type(pd.get_dummies(df, sparse=True))
Out[3]: pandas.core.frame.DataFrame

In [4]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
Out[4]: pandas.core.sparse.frame.SparseDataFrame

新行为

现在，返回类型始终是 DataFrame 。

In [78]: type(pd.get_dummies(df, sparse=True))
Out[78]: pandas.core.frame.DataFrame

In [79]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
Out[79]: pandas.core.frame.DataFrame

备注

在内存使用率方面， SparseDataFrame 和一个 DataFrame 具有稀疏值的。内存使用量将与上一版本的Pandas相同。

在中引发ValueError `DataFrame.to_dict(orient='index')`#

窃听 DataFrame.to_dict() 加薪 ValueError 与一起使用时 orient='index' 和非唯一索引，而不是丢失数据 (GH22801 )

In [80]: df = pd.DataFrame({'a': [1, 2], 'b': [0.5, 0.75]}, index=['A', 'A'])

In [81]: df
Out[81]: 
   a     b
A  1  0.50
A  2  0.75

[2 rows x 2 columns]

In [82]: df.to_dict(orient='index')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [82], in <cell line: 1>()
----> 1 df.to_dict(orient='index')

File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/frame.py:1967, in DataFrame.to_dict(self, orient, into)
   1965 elif orient == "index":
   1966     if not self.index.is_unique:
-> 1967         raise ValueError("DataFrame index must be unique for orient='index'.")
   1968     return into_c(
   1969         (t[0], dict(zip(self.columns, map(maybe_box_native, t[1:]))))
   1970         for t in self.itertuples(name=None)
   1971     )
   1973 else:

ValueError: DataFrame index must be unique for orient='index'.

勾选日期偏移规格化限制#

创建一个 Tick 对象 (Day ， Hour ， Minute ， Second ， Milli ， Micro ， Nano )与 normalize=True 不再受支持。这防止了加法可能不是单调或关联的意外行为。 (GH21427 )

以前的行为 ：

In [2]: ts = pd.Timestamp('2018-06-11 18:01:14')

In [3]: ts
Out[3]: Timestamp('2018-06-11 18:01:14')

In [4]: tic = pd.offsets.Hour(n=2, normalize=True)
   ...:

In [5]: tic
Out[5]: <2 * Hours>

In [6]: ts + tic
Out[6]: Timestamp('2018-06-11 00:00:00')

In [7]: ts + tic + tic + tic == ts + (tic + tic + tic)
Out[7]: False

新行为 ：

In [83]: ts = pd.Timestamp('2018-06-11 18:01:14')

In [84]: tic = pd.offsets.Hour(n=2)

In [85]: ts + tic + tic + tic == ts + (tic + tic + tic)
Out[85]: True

周期减法#

A的减法 Period 从另一个人 Period 将会给一个 DateOffset 。而不是整数 (GH21314 )

以前的行为 ：

In [2]: june = pd.Period('June 2018')

In [3]: april = pd.Period('April 2018')

In [4]: june - april
Out [4]: 2

新行为 ：

In [86]: june = pd.Period('June 2018')

In [87]: april = pd.Period('April 2018')

In [88]: june - april
Out[88]: <2 * MonthEnds>

类似地，减去 Period 从一个 PeriodIndex 现在将返回一个 Index 的 DateOffset 对象而不是 Int64Index

以前的行为 ：

In [2]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [3]: pi - pi[0]
Out[3]: Int64Index([0, 1, 2], dtype='int64')

新行为 ：

In [89]: pi = pd.period_range('June 2018', freq='M', periods=3)

In [90]: pi - pi[0]
Out[90]: Index([<0 * MonthEnds>, <MonthEnd>, <2 * MonthEnds>], dtype='object')

加/减 `NaN` 从… `DataFrame`#

加或减 NaN 从一个 DataFrame 列中包含 timedelta64[ns] Dtype现在将引发 TypeError 而不是全部返回-NaT。这是为了与 TimedeltaIndex 和 Series 行为 (GH22163 )

In [91]: df = pd.DataFrame([pd.Timedelta(days=1)])

In [92]: df
Out[92]: 
       0
0 1 days

[1 rows x 1 columns]

以前的行为 ：

In [4]: df = pd.DataFrame([pd.Timedelta(days=1)])

In [5]: df - np.nan
Out[5]:
    0
0 NaT

新行为 ：

In [2]: df - np.nan
...
TypeError: unsupported operand type(s) for -: 'TimedeltaIndex' and 'float'

广播更改的DataFrame比较操作#

此前，该公司的广播行为 DataFrame 比较运算 (== ， != ，...)与算术运算的行为不一致 (+ ， - 、...)。比较运算的行为已更改，以匹配这些情况下的算术运算。 (GH22880 )

受影响的个案包括：

在2维空间中运行 np.ndarray 无论是1行还是1列，现在将以相同的方式广播 np.ndarray 会不会 (GH23000 )。
中的行数匹配长度的列表或元组 DataFrame 现在将提高 ValueError 而不是逐列操作 (GH22880 。
a list or tuple with length matching the number of columns in the DataFrame will now operate row-by-row instead of raising ValueError (GH22880).

In [93]: arr = np.arange(6).reshape(3, 2)

In [94]: df = pd.DataFrame(arr)

In [95]: df
Out[95]: 
   0  1
0  0  1
1  2  3
2  4  5

[3 rows x 2 columns]

以前的行为 ：

In [5]: df == arr[[0], :]
    ...: # comparison previously broadcast where arithmetic would raise
Out[5]:
       0      1
0   True   True
1  False  False
2  False  False
In [6]: df + arr[[0], :]
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)

In [7]: df == (1, 2)
    ...: # length matches number of columns;
    ...: # comparison previously raised where arithmetic would broadcast
...
ValueError: Invalid broadcasting comparison [(1, 2)] with block values
In [8]: df + (1, 2)
Out[8]:
   0  1
0  1  3
1  3  5
2  5  7

In [9]: df == (1, 2, 3)
    ...:  # length matches number of rows
    ...:  # comparison previously broadcast where arithmetic would raise
Out[9]:
       0      1
0  False   True
1   True  False
2  False  False
In [10]: df + (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

新行为 ：

# Comparison operations and arithmetic operations both broadcast.
In [96]: df == arr[[0], :]
Out[96]: 
       0      1
0   True   True
1  False  False
2  False  False

[3 rows x 2 columns]

In [97]: df + arr[[0], :]
Out[97]: 
   0  1
0  0  2
1  2  4
2  4  6

[3 rows x 2 columns]

# Comparison operations and arithmetic operations both broadcast.
In [98]: df == (1, 2)
Out[98]: 
       0      1
0  False  False
1  False  False
2  False  False

[3 rows x 2 columns]

In [99]: df + (1, 2)
Out[99]: 
   0  1
0  1  3
1  3  5
2  5  7

[3 rows x 2 columns]

# Comparison operations and arithmetic operations both raise ValueError.
In [6]: df == (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

In [7]: df + (1, 2, 3)
...
ValueError: Unable to coerce to Series, length must be 2: given 3

DataFrame算术运算广播更改#

DataFrame 二维运算时的算术运算 np.ndarray 对象现在的广播方式与 np.ndarray 广播。 (GH23000 )

In [100]: arr = np.arange(6).reshape(3, 2)

In [101]: df = pd.DataFrame(arr)

In [102]: df
Out[102]: 
   0  1
0  0  1
1  2  3
2  4  5

[3 rows x 2 columns]

以前的行为 ：

In [5]: df + arr[[0], :]   # 1 row, 2 columns
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (1, 2)
In [6]: df + arr[:, [1]]   # 1 column, 3 rows
...
ValueError: Unable to coerce to DataFrame, shape must be (3, 2): given (3, 1)

新行为 ：

In [103]: df + arr[[0], :]   # 1 row, 2 columns
Out[103]: 
   0  1
0  0  2
1  2  4
2  4  6

[3 rows x 2 columns]

In [104]: df + arr[:, [1]]   # 1 column, 3 rows
Out[104]: 
   0   1
0  1   2
1  5   6
2  9  10

[3 rows x 2 columns]

系列和索引数据-数据类型不兼容#

Series and Index constructors now raise when the data is incompatible with a passed dtype= (GH15832)

以前的行为 ：

In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
0    18446744073709551615
dtype: uint64

新行为 ：

In [4]: pd.Series([-1], dtype="uint64")
Out [4]:
...
OverflowError: Trying to coerce negative values to unsigned integers

串联更改#

呼叫 pandas.concat() 在一个 Categorical 值为NA值的整型现在会使它们在与另一个以外的任何对象连接时被作为对象处理 Categorical INT的数量 (GH19214 )

In [105]: s = pd.Series([0, 1, np.nan])

In [106]: c = pd.Series([0, 1, np.nan], dtype="category")

以前的行为

In [3]: pd.concat([s, c])
Out[3]:
0    0.0
1    1.0
2    NaN
0    0.0
1    1.0
2    NaN
dtype: float64

新行为

In [107]: pd.concat([s, c])
Out[107]: 
0    0.0
1    1.0
2    NaN
0    0.0
1    1.0
2    NaN
Length: 6, dtype: float64

类似DateTimeliAPI的更改#

为 DatetimeIndex 和 TimedeltaIndex 带非``无`` freq 属性、整数dtype数组的加法或减法 Index 将返回同一类的对象 (GH19959 )
DateOffset objects are now immutable. Attempting to alter one of these will now raise AttributeError (GH21341)
PeriodIndex subtraction of another PeriodIndex will now return an object-dtype Index of DateOffset objects instead of raising a TypeError (GH20049)
cut() and qcut() now returns a DatetimeIndex or TimedeltaIndex bins when the input is datetime or timedelta dtype respectively and retbins=True (GH19891)
DatetimeIndex.to_period() 和 Timestamp.to_period() 将在时区信息丢失时发出警告 (GH21333 )
PeriodIndex.tz_convert() 和 PeriodIndex.tz_localize() 已被移除 (GH21781 )

其他API更改#

一座新建的空地 DataFrame 以整数作为 dtype 现在将仅强制转换为 float64 如果 index 是指定的 (GH22858 )
Series.str.cat() will now raise if others is a set (GH23009)
Passing scalar values to DatetimeIndex or TimedeltaIndex will now raise TypeError instead of ValueError (GH23539)
max_rows and max_cols parameters removed from HTMLFormatter since truncation is handled by DataFrameFormatter (GH23818)
read_csv() will now raise a ValueError if a column with missing values is declared as having dtype bool (GH20591)
结果的列顺序。 DataFrame 从… MultiIndex.to_frame() 现在保证与 MultiIndex.names 秩序。 (GH22420 )
Incorrectly passing a DatetimeIndex to MultiIndex.from_tuples(), rather than a sequence of tuples, now raises a TypeError rather than a ValueError (GH24024)
pd.offsets.generate_range() 论据 time_rule 已删除；请使用 offset 取而代之的是 (GH24157 )
在0.23.x中，Pandas会提高 ValueError 在合并数值列时(例如 int Dtype列)和一个 object Dtype列 (GH9780 )。我们重新启用了合并功能 object 和其他数据类型；Pandas仍将在数字和 object 仅由字符串组成的数据类型化列 (GH21681 )
Accessing a level of a MultiIndex with a duplicate name (e.g. in get_level_values()) now raises a ValueError instead of a KeyError (GH21678).
无效的构造 IntervalDtype 现在将始终引发 TypeError 而不是 ValueError 如果子类型无效 (GH21185 )
Trying to reindex a DataFrame with a non unique MultiIndex now raises a ValueError instead of an Exception (GH21770)
Index subtraction will attempt to operate element-wise instead of raising TypeError (GH19369)
pandas.io.formats.style.Styler supports a number-format property when using to_excel() (GH22015)
DataFrame.corr() 和 Series.corr() 现在，提高一个 ValueError 以及有用的错误消息，而不是 KeyError 当使用无效方法提供时 (GH22298 )
shift() 现在将始终返回副本，而不是以前在移位0时返回self的行为 (GH22397 )
DataFrame.set_index() 现在给出了一个更好的(且不太频繁的)KeyError，引发 ValueError 对于不正确的类型，不会在重复的列名上失败 drop=True 。 (GH22484 )
现在，使用相同类型的多个ExtensionArray对DataFrame的单行进行切片可以保留数据类型，而不是强制对象 (GH22784 )
DateOffset 属性 _cacheable 和方法 _should_cache 已被移除 (GH23118 )
Series.searchsorted() 在提供要搜索的标量值时，现在返回标量而不是数组 (GH23801 )。
Categorical.searchsorted() 在提供要搜索的标量值时，现在返回标量而不是数组 (GH23466 )。
Categorical.searchsorted() 现在引发一个 KeyError 而不是一个 ValueError ，如果在其类别中找不到搜索的关键字 (GH23466 )。
Index.hasnans() 和 Series.hasnans() 现在总是返回一个python布尔值。以前，根据具体情况，可以返回一条 Python 或一条数字布尔值 (GH23294 )。
的论据顺序 DataFrame.to_html() 和 DataFrame.to_string() 被重新排列以使彼此一致。 (GH23614 )
CategoricalIndex.reindex() 现在引发一个 ValueError 如果目标索引不是唯一的且不等于当前索引。它以前仅在目标索引不是绝对数据类型时引发 (GH23963 )。
Series.to_list() and Index.to_list() are now aliases of Series.tolist respectively Index.tolist (GH8826)
The result of SparseSeries.unstack is now a DataFrame with sparse values, rather than a SparseDataFrame (GH24372).
DatetimeIndex and TimedeltaIndex no longer ignore the dtype precision. Passing a non-nanosecond resolution dtype will raise a ValueError (GH24753)

扩展类型更改#

平等性和可哈希性

Pandas现在要求扩展数据类型是可哈希的(即各自的 ExtensionDtype 对象的值；哈希性不是对相应 ExtensionArray )。基类实现了默认的 __eq__ 和 __hash__ 。如果您具有参数化的dtype，则应更新 ExtensionDtype._metadata 元组以匹配您的签名 __init__ 方法。看见 pandas.api.extensions.ExtensionDtype 了解更多信息 (GH22476 )。

新的和改变的方法

dropna() 已添加 (GH21185 )
repeat() 已添加 (GH24349 )
The ExtensionArray constructor, _from_sequence now take the keyword arg copy=False (GH21185)
pandas.api.extensions.ExtensionArray.shift() 作为基本内容的一部分添加 ExtensionArray 接口 (GH22387 )。
searchsorted() 已添加 (GH24350 )
支持减量操作，例如 sum ， mean 通过选择加入基类方法重写 (GH22762 )
ExtensionArray.isna() is allowed to return an ExtensionArray (GH22325).

数据类型更改

ExtensionDtype has gained the ability to instantiate from string dtypes, e.g. decimal would instantiate a registered DecimalDtype; furthermore the ExtensionDtype has gained the method construct_array_type (GH21185)
已添加 ExtensionDtype._is_numeric 用于控制是否将扩展数据类型视为数字 (GH22290 )。
已添加 pandas.api.types.register_extension_dtype() 向Pandas注册扩展类型 (GH22664 )
更新了 .type 的属性 PeriodDtype ， DatetimeTZDtype ，以及 IntervalDtype 是数据类型的实例 (Period ， Timestamp ，以及 Interval 分别) (GH22938 )

操作员支持

A Series 基于一个 ExtensionArray 现在支持算术和比较运算符 (GH19577 )。提供运营商支持的方法有两种 ExtensionArray ：

定义每个运算符 ExtensionArray 子类。
使用Pandas中的运算符实现，该实现依赖于已在 ExtensionArray 。

请参阅 ExtensionArray Operator Support 文档部分，了解有关添加操作员支持的两种方式的详细信息。

其他变化

的默认Repr pandas.api.extensions.ExtensionArray 现已提供 (GH23601 )。
ExtensionArray._formatting_values() 已弃用。使用 ExtensionArray._formatter 取而代之的是。 (GH23601 )
一个 ExtensionArray 使用布尔型数据类型现在可以作为布尔型索引器正确工作。 pandas.api.types.is_bool_dtype() 现在正确地认为它们是布尔型的 (GH22326 )

错误修复

窃听 Series.get() 为 Series 使用 ExtensionArray 和整数索引 (GH21257 )
shift() now dispatches to ExtensionArray.shift() (GH22386)
Series.combine() works correctly with ExtensionArray inside of Series (GH20825)
Series.combine() WITH标量参数现在适用于任何函数类型 (GH21248 )
Series.astype() and DataFrame.astype() now dispatch to ExtensionArray.astype() (GH21185).
切分单行 DataFrame 相同类型的多个ExtensionArray现在保留数据类型，而不是强制对象 (GH22784 )
串联多个 Series 不同的扩展数据类型不会强制转换为对象数据类型 (GH22994 )
Series backed by an ExtensionArray now work with util.hash_pandas_object() (GH23066)
DataFrame.stack() 不再转换为每个列具有相同扩展数据类型的DataFrames的对象数据类型。输出系列将具有与列相同的数据类型 (GH23077 )。
Series.unstack() 和 DataFrame.unstack() 不再将扩展数组转换为对象dtype ndarray。输出中的每一列 DataFrame 现在将具有与输入相同的数据类型 (GH23077 )。
分组时出现错误 Dataframe.groupby() 和聚合在 ExtensionArray 它没有返回实际的 ExtensionArray 数据类型 (GH23227 )。
窃听 pandas.merge() 在扩展数组支持的列上合并时 (GH23020 )。

不推荐使用#

MultiIndex.labels has been deprecated and replaced by MultiIndex.codes. The functionality is unchanged. The new name better reflects the natures of these codes and makes the MultiIndex API more similar to the API for CategoricalIndex (GH13443). As a consequence, other uses of the name labels in MultiIndex have also been deprecated and replaced with codes:
- 您应该初始化一个 MultiIndex 使用名为的参数 codes 而不是 labels 。
- MultiIndex.set_labels 已被弃用，取而代之的是 MultiIndex.set_codes() 。
- FOR方法 MultiIndex.copy() ，即 labels 参数已弃用，并替换为 codes 参数。
DataFrame.to_stata() ， read_stata() ， StataReader 和 StataWriter 已经不推荐使用 encoding 争论。Stata DTA文件的编码由文件类型决定，不能更改 (GH21244 )
MultiIndex.to_hierarchical() 已弃用，并将在将来的版本中删除 (GH21613 )
Series.ptp() 已弃用。使用 numpy.ptp 取而代之的是 (GH21614 )
Series.compress() 已弃用。使用 Series[condition] 取而代之的是 (GH18262 )
的签名 Series.to_csv() 已经穿着统一的制服 DataFrame.to_csv() ：第一个参数的名称现在为 path_or_buf ，则后续参数的顺序已更改，则 header 参数现在默认为 True 。 (GH19715 )
Categorical.from_codes() 方法提供浮点值。 codes 论点。 (GH21767 )
pandas.read_table() 已弃用。相反，您可以使用 read_csv() 传球 sep='\t' 如果有必要的话。此弃用项已在0.25.0中删除。 (GH21948 )
Series.str.cat() 已经不建议使用任意列表点赞在名单-点赞。一个类似列表的容器可能仍然包含许多 Series ， Index 或一维 np.ndarray ，或者，也可以仅使用标量值。 (GH21950 )
FrozenNDArray.searchsorted() has deprecated the v parameter in favor of value (GH14645)
DatetimeIndex.shift() 和 PeriodIndex.shift() 现在接受 periods 参数而不是 n 为了保持与 Index.shift() 和 Series.shift() 。使用 n 抛出弃用警告 (GH22458 ， GH22912 )
这个 fastpath 不建议使用不同索引构造函数的关键字 (GH23110 )。
Timestamp.tz_localize() ， DatetimeIndex.tz_localize() ，以及 Series.tz_localize() 已经不推荐使用 errors 支持的论点 nonexistent 论据 (GH8917 )
这个班级 FrozenNDArray 已被弃用。当不腌制时， FrozenNDArray 将不会被腌制到 np.ndarray 一旦删除了此类 (GH9031 )
The methods DataFrame.update() and Panel.update() have deprecated the raise_conflict=False|True keyword in favor of errors='ignore'|'raise' (GH23585)
The methods Series.str.partition() and Series.str.rpartition() have deprecated the pat keyword in favor of sep (GH22676)
不推荐使用 nthreads 的关键字 pandas.read_feather() 赞成 use_threads 以反映中的变化 pyarrow>=0.11.0 。 (GH23053 )
pandas.read_excel() 已经不赞成接受 usecols 作为一个整数。请传递一个从0到的整数列表 usecols 改为包含式 (GH23527 )
构建一个 TimedeltaIndex 从数据中获取 datetime64 -dtype数据已弃用，将引发 TypeError 在未来的版本中 (GH23539 )
构建一个 DatetimeIndex 从数据中获取 timedelta64 -dtype数据已弃用，将引发 TypeError 在未来的版本中 (GH23675 )
这个 keep_tz=False 选项(默认设置) keep_tz 的关键字 DatetimeIndex.to_series() 已弃用 (GH17832 )。
Timezone converting a tz-aware datetime.datetime or Timestamp with Timestamp and the tz argument is now deprecated. Instead, use Timestamp.tz_convert() (GH23579)
pandas.api.types.is_period() is deprecated in favor of pandas.api.types.is_period_dtype (GH23917)
pandas.api.types.is_datetimetz() is deprecated in favor of pandas.api.types.is_datetime64tz (GH23917)
Creating a TimedeltaIndex, DatetimeIndex, or PeriodIndex by passing range arguments start, end, and periods is deprecated in favor of timedelta_range(), date_range(), or period_range() (GH23919)
传递一个字符串别名，如 'datetime64[ns, UTC]' 作为 unit 参数设置为 DatetimeTZDtype 已弃用。使用 DatetimeTZDtype.construct_from_string 取而代之的是 (GH23990 )。
这个 skipna 的参数 infer_dtype() 将切换到 True 在未来的Pandas版本中默认使用 (GH17066 ， GH24050 )
在……里面 Series.where() 使用分类数据，提供 other 类别中不存在的内容已弃用。将分类数据转换为不同的数据类型或将 other 先到类别 (GH24077 )。
Series.clip_lower() ， Series.clip_upper() ， DataFrame.clip_lower() 和 DataFrame.clip_upper() 已弃用，并将在将来的版本中删除。使用 Series.clip(lower=threshold) ， Series.clip(upper=threshold) 以及等效的 DataFrame 方法： (GH24203 )
Series.nonzero() 已弃用，并将在将来的版本中删除 (GH18262 )
将整数传递给 Series.fillna() 和 DataFrame.fillna() 使用 timedelta64[ns] Dtype已弃用，将引发 TypeError 在未来的版本中。使用 obj.fillna(pd.Timedelta(...)) 取而代之的是 (GH24694 )
Series.cat.categorical ， Series.cat.name 和 Series.cat.index 已经被弃用了。使用上的属性 Series.cat 或 Series 直接去吧。 (GH24751 )。
传递不带精度的数据类型，如 np.dtype('datetime64') 或 timedelta64 至 Index ， DatetimeIndex 和 TimedeltaIndex 现在已弃用。改用纳秒精度的dtype (GH24753 )。

不建议使用带日期时间和时间增量的整数加/减#

在过去，用户可以(在某些情况下)将整数或整数数据类型数组从 Timestamp ， DatetimeIndex 和 TimedeltaIndex 。

这种用法现在已弃用。而是加或减对象的整数倍 freq 属性 (GH21939 ， GH23878 )。

以前的行为 ：

In [5]: ts = pd.Timestamp('1994-05-06 12:15:16', freq=pd.offsets.Hour())
In [6]: ts + 2
Out[6]: Timestamp('1994-05-06 14:15:16', freq='H')

In [7]: tdi = pd.timedelta_range('1D', periods=2)
In [8]: tdi - np.array([2, 1])
Out[8]: TimedeltaIndex(['-1 days', '1 days'], dtype='timedelta64[ns]', freq=None)

In [9]: dti = pd.date_range('2001-01-01', periods=2, freq='7D')
In [10]: dti + pd.Index([1, 2])
Out[10]: DatetimeIndex(['2001-01-08', '2001-01-22'], dtype='datetime64[ns]', freq=None)

新行为 ：

In [108]: ts = pd.Timestamp('1994-05-06 12:15:16', freq=pd.offsets.Hour())

In [109]: ts + 2 * ts.freq
Out[109]: Timestamp('1994-05-06 14:15:16', freq='H')

In [110]: tdi = pd.timedelta_range('1D', periods=2)

In [111]: tdi - np.array([2 * tdi.freq, 1 * tdi.freq])
Out[111]: TimedeltaIndex(['-1 days', '1 days'], dtype='timedelta64[ns]', freq=None)

In [112]: dti = pd.date_range('2001-01-01', periods=2, freq='7D')

In [113]: dti + pd.Index([1 * dti.freq, 2 * dti.freq])
Out[113]: DatetimeIndex(['2001-01-08', '2001-01-22'], dtype='datetime64[ns]', freq=None)

将整型数据和时区传递给DatetimeIndex#

的行为 DatetimeIndex 当传递整数数据和时区时，在未来版本的Pandas中会发生变化。以前，这些时间被解释为所需时区的挂牌时间。将来，这些时间将被解释为UTC的挂钟时间，然后将其转换为所需的时区 (GH24559 )。

默认行为保持不变，但会发出警告：

In [3]: pd.DatetimeIndex([946684800000000000], tz="US/Central")
/bin/ipython:1: FutureWarning:
    Passing integer-dtype data and a timezone to DatetimeIndex. Integer values
    will be interpreted differently in a future version of pandas. Previously,
    these were viewed as datetime64[ns] values representing the wall time
    *in the specified timezone*. In the future, these will be viewed as
    datetime64[ns] values representing the wall time *in UTC*. This is similar
    to a nanosecond-precision UNIX epoch. To accept the future behavior, use

        pd.to_datetime(integer_data, utc=True).tz_convert(tz)

    To keep the previous behavior, use

        pd.to_datetime(integer_data).tz_localize(tz)

 #!/bin/python3
 Out[3]: DatetimeIndex(['2000-01-01 00:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

如警告消息所述，通过指定整数值为UTC，然后转换为最终时区，选择未来的行为：

In [114]: pd.to_datetime([946684800000000000], utc=True).tz_convert('US/Central')
Out[114]: DatetimeIndex(['1999-12-31 18:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

通过直接定位到最终时区，可以保留旧的行为：

In [115]: pd.to_datetime([946684800000000000]).tz_localize('US/Central')
Out[115]: DatetimeIndex(['2000-01-01 00:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)

将时区感知系列和索引转换为NumPy数组#

从一个或多个 Series 或 Index 在支持时区的情况下，默认情况下，日期时间数据将更改为保留时区 (GH23569 )。

NumPy没有用于支持时区的日期时间的专用数据类型。在过去，将一个 Series 或 DatetimeIndex 使用时区感知的datatimes将通过以下方式转换为NumPy数组

将TZ感知数据转换为UTC
正在删除时区-信息
返回一个 numpy.ndarray 使用 datetime64[ns] 数据类型

未来版本的PANDA将通过返回对象dtype NumPy数组来保留时区信息，其中每个值都是 Timestamp 附加了正确的时区

In [116]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

In [117]: ser
Out[117]: 
0   2000-01-01 00:00:00+01:00
1   2000-01-02 00:00:00+01:00
Length: 2, dtype: datetime64[ns, CET]

默认行为保持不变，但会发出警告

In [8]: np.asarray(ser)
/bin/ipython:1: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive
      ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray
      with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'.

        To accept the future behavior, pass 'dtype=object'.
        To keep the old behavior, pass 'dtype="datetime64[ns]"'.
  #!/bin/python3
Out[8]:
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

属性，可以在没有任何警告的情况下获取以前或将来的行为 dtype

以前的行为

In [118]: np.asarray(ser, dtype='datetime64[ns]')
Out[118]: 
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

未来行为

# New behavior
In [119]: np.asarray(ser, dtype=object)
Out[119]: 
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

或通过使用 Series.to_numpy()

In [120]: ser.to_numpy()
Out[120]: 
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

In [121]: ser.to_numpy(dtype="datetime64[ns]")
Out[121]: 
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

以上所有内容都适用于 DatetimeIndex 也有TZ感知的值。

删除先前版本的弃用/更改#

这个 LongPanel 和 WidePanel 类已被删除 (GH10892 )
Series.repeat() has renamed the reps argument to repeats (GH14645)
Several private functions were removed from the (non-public) module pandas.core.common (GH22001)
Removal of the previously deprecated module pandas.core.datetools (GH14105, GH14094)
Strings passed into DataFrame.groupby() that refer to both column and index levels will raise a ValueError (GH14432)
Index.repeat() and MultiIndex.repeat() have renamed the n argument to repeats (GH14645)
这个 Series 构造函数和 .astype 方法现在将引发 ValueError 如果传入的时间戳数据类型没有单位(例如 np.datetime64 )，用于 dtype 参数 (GH15987 )
Removal of the previously deprecated as_indexer keyword completely from str.match() (GH22356, GH6581)
模块 pandas.types ， pandas.computation ，以及 pandas.util.decorators 已被移除 (GH16157 ， GH16250 )
Removed the pandas.formats.style shim for pandas.io.formats.style.Styler (GH16059)
pandas.pnow ， pandas.match ， pandas.groupby ， pd.get_store ， pd.Expr ，以及 pd.Term 已被移除 (GH15538 ， GH15940 )
Categorical.searchsorted() and Series.searchsorted() have renamed the v argument to value (GH14645)
pandas.parser ， pandas.lib ，以及 pandas.tslib 已被移除 (GH15537 )
Index.searchsorted() have renamed the key argument to value (GH14645)
DataFrame.consolidate 和 Series.consolidate 已被移除 (GH15501 )
Removal of the previously deprecated module pandas.json (GH19944)
该模块 pandas.tools 已被删除 (GH15358 ， GH16005 )
SparseArray.get_values() 和 SparseArray.to_dense() 已经丢弃了 fill 参数 (GH14686 )
DataFrame.sortlevel 和 Series.sortlevel 已被移除 (GH15099 )
SparseSeries.to_dense() 已经放弃了 sparse_only 参数 (GH14686 )
DataFrame.astype() and Series.astype() have renamed the raise_on_error argument to errors (GH14967)
is_sequence, is_any_int_dtype, and is_floating_dtype have been removed from pandas.api.types (GH16163, GH16189)

性能改进#

单调递增的切片序列与DataFrame CategoricalIndex 现在速度非常快，其速度与使用 Int64Index 。使用标签(使用.loc)和位置(.iloc)编制索引时，速度都会提高 (GH20395 )切割单调递增的 CategoricalIndex 本身(即 ci[1000:2000] )显示了与上面类似的速度改进 (GH21659 )
Improved performance of CategoricalIndex.equals() when comparing to another CategoricalIndex (GH24023)
改进的性能 Series.describe() 在数字dtpyes的情况下 (GH21274 )
改进的性能 pandas.core.groupby.GroupBy.rank() 在处理并列排名时 (GH21237 )
改进的性能 DataFrame.set_index() 其中的列由 Period 对象 (GH21582 ， GH21606 )
改进的性能 Series.at() 和 Index.get_value() 对于扩展数组值(例如 Categorical ) (GH24204 )
提高了成员资格检查的性能 Categorical 和 CategoricalIndex (即 x in cat -样式检查要快得多)。 CategoricalIndex.contains() 同样要快得多 (GH21369 ， GH21508 )
改进的性能 HDFStore.groups() (和依赖函数，如 HDFStore.keys() 。(即 x in store 检查速度快得多) (GH21372 )
Improved the performance of pandas.get_dummies() with sparse=True (GH21997)
改进的性能 IndexEngine.get_indexer_non_unique() 对于已排序的非唯一索引 (GH9466 )
Improved performance of PeriodIndex.unique() (GH23083)
改进的性能 concat() 为 Series 对象 (GH23404 )
改进的性能 DatetimeIndex.normalize() 和 Timestamp.normalize() 对于时区原始时间或UTC日期时间 (GH23634 )
改进的性能 DatetimeIndex.tz_localize() 和各种各样的 DatetimeIndex 具有Dateutil UTC时区的属性 (GH23772 )
Fixed a performance regression on Windows with Python 3.7 of read_csv() (GH23516)
改进的性能 Categorical 的构造函数 Series 对象 (GH23814 )
改进的性能 where() 对于分类数据 (GH24077 )
提高了遍历 Series 。使用 DataFrame.itertuples() 现在无需在内部分配所有元素的列表即可创建迭代器 (GH20783 )
改进的性能 Period 构造者，额外受益 PeriodArray 和 PeriodIndex 创作 (GH24084 ， GH24118 )
提高了TZ感知的性能 DatetimeArray 二元运算 (GH24491 )

错误修复#

直截了当的#

Bug in Categorical.from_codes() where NaN values in codes were silently converted to 0 (GH21767). In the future this will raise a ValueError. Also changes the behavior of .from_codes([1.1, 2.0]).
窃听 Categorical.sort_values() 哪里 NaN 值始终位于前面，而不考虑 na_position 价值。 (GH22556 )。
使用布尔值的索引时出现错误 Categorical 。现在是布尔值的 Categorical 被视为布尔掩码 (GH22665 )
构建一个 CategoricalIndex 使用空值和布尔类别引发 ValueError 在更改为dtype强制之后 (GH22702 )。
窃听 Categorical.take() 使用用户提供的 fill_value 不对 fill_value ，这可能会导致 ValueError 、不正确的结果或分段错误 (GH23296 )。
In Series.unstack(), specifying a fill_value not present in the categories now raises a TypeError rather than ignoring the fill_value (GH23284)
重采样时出现错误 DataFrame.resample() 聚集在分类数据上，分类数据类型正在迷失。 (GH23227 )
的许多方法中存在错误 .str -访问器，它在调用 CategoricalIndex.str 构造函数 (GH23555 ， GH23556 )
窃听 Series.where() 丢失分类数据的分类数据类型 (GH24077 )
窃听 Categorical.apply() 哪里 NaN 值可能会被不可预测地处理。现在它们保持不变 (GH24241 )
Bug in Categorical comparison methods incorrectly raising ValueError when operating against a DataFrame (GH24630)
窃听 Categorical.set_categories() 其中设置的新类别更少 rename=True 导致分段故障 (GH24675 )

类似日期的#

修复了两个 DateOffset 具有不同属性的对象 normalize 属性可以评估为相等 (GH21404 )
Fixed bug where Timestamp.resolution() incorrectly returned 1-microsecond timedelta instead of 1-nanosecond Timedelta (GH21336, GH21365)
窃听 to_datetime() 这并不一致地返回 Index 什么时候 box=True 被指定为 (GH21864 )
Bug in DatetimeIndex comparisons where string comparisons incorrectly raises TypeError (GH22074)
窃听 DatetimeIndex 比较时的对比 timedelta64[ns] Dtype数组；在某些情况下 TypeError 被错误地引发，在其他情况下它错误地未能引发 (GH22074 )
窃听 DatetimeIndex 与对象的dtype数组进行比较时的比较 (GH22074 )
窃听 DataFrame 使用 datetime64[ns] 数据类型加法和减法 Timedelta -类对象 (GH22005 ， GH22163 )
窃听 DataFrame 使用 datetime64[ns] 数据类型加法和减法 DateOffset 对象返回一个 object 数据类型而不是 datetime64[ns] 数据类型 (GH21610 ， GH22163 )
窃听 DataFrame 使用 datetime64[ns] 数据类型比较对象 NaT 不正确 (GH22242 ， GH22163 )
窃听 DataFrame 使用 datetime64[ns] 数据类型减法 Timestamp -类对象返回错误 datetime64[ns] 数据类型而不是 timedelta64[ns] 数据类型 (GH8554 ， GH22163 )
窃听 DataFrame 使用 datetime64[ns] 数据类型减法 np.datetime64 单位为非纳秒的对象无法转换为纳秒 (GH18874 ， GH22163 )
窃听 DataFrame 对比 Timestamp -如未能举起的物体 TypeError 对于类型不匹配的不等性检查 (GH8932 ， GH22163 )
窃听 DataFrame 具有混合数据类型，包括 datetime64[ns] 错误地提高 TypeError 论平等比较 (GH13128 ， GH22163 )
窃听 DataFrame.values 返回一个 DatetimeIndex 对于单列 DataFrame 具有TZ感知的日期时间值。现在是2-D numpy.ndarray 的 Timestamp 对象，则返回 (GH24024 )
Bug in DataFrame.eq() comparison against NaT incorrectly returning True or NaN (GH15697, GH22163)
Bug in DatetimeIndex subtraction that incorrectly failed to raise OverflowError (GH22492, GH22508)
窃听 DatetimeIndex 错误地允许使用 Timedelta 对象 (GH20464 )
Bug in DatetimeIndex where frequency was being set if original frequency was None (GH22150)
Bug in rounding methods of DatetimeIndex (round(), ceil(), floor()) and Timestamp (round(), ceil(), floor()) could give rise to loss of precision (GH22591)
窃听 to_datetime() vbl.用一种. Index 参数，该参数将删除 name 从结果来看 (GH21697 )
窃听 PeriodIndex 其中添加或减去一个 timedelta 或 Tick 对象产生了错误的结果 (GH22988 )
Bug in the Series 在句号dtype数据前缺少一个空格的错误 (GH23601 )
窃听 date_range() 当将开始日期递减到过去的结束日期时，频率为负 (GH23270 )
Bug in Series.min() which would return NaN instead of NaT when called on a series of NaT (GH23282)
Bug in Series.combine_first() not properly aligning categoricals, so that missing values in self where not filled by valid values from other (GH24147)
窃听 DataFrame.combine() 使用类似日期时间的值引发TypeError (GH23079 )
Bug in date_range() with frequency of Day or higher where dates sufficiently far in the future could wrap around to the past instead of raising OutOfBoundsDatetime (GH14187)
窃听 period_range() 忽略频率 start 和 end 当这些内容作为 Period 对象 (GH20535 )。
窃听 PeriodIndex 具有属性 freq.n 大于1，其中将 DateOffset 对象将返回不正确的结果 (GH23215 )
窃听 Series 在设置类似日期时间的值时将字符串索引解释为字符列表 (GH23451 )
窃听 DataFrame 从ndarray的ndarray创建新列时 Timestamp 具有时区的对象创建对象-dtype列，而不是具有时区的DateTime (GH23932 )
Bug in Timestamp constructor which would drop the frequency of an input Timestamp (GH22311)
窃听 DatetimeIndex 在哪里呼叫？ np.array(dtindex, dtype=object) 将错误地返回 long 对象 (GH23524 )
Bug in Index where passing a timezone-aware DatetimeIndex and dtype=object would incorrectly raise a ValueError (GH23524)
窃听 Index 在哪里呼叫？ np.array(dtindex, dtype=object) 在时区-幼稚 DatetimeIndex 将返回一个数组 datetime 对象而不是 Timestamp 对象，可能会丢失时间戳的纳秒部分 (GH23524 )
窃听 Categorical.__setitem__ 不允许与另一个人进行设置 Categorical 当两者都是无序的并且具有相同的类别，但顺序不同时 (GH24142 )
窃听 date_range() 其中，使用毫秒或更高分辨率的日期可能会返回不正确的值或索引中错误的值数 (GH24110 )
窃听 DatetimeIndex 在哪里构建一个 DatetimeIndex 从一个 Categorical 或 CategoricalIndex 会错误地丢弃时区信息 (GH18664 )
窃听 DatetimeIndex 和 TimedeltaIndex 其中，索引使用 Ellipsis 会错误地丢失索引的 freq 属性 (GH21282 )
在传递不正确的 freq 参数为 DatetimeIndex 使用 NaT 作为传递的数据中的第一个条目 (GH11587 )
窃听 to_datetime() 哪里 box 和 utc 将参数传递给 DataFrame 或 dict 单位映射的 (GH23760 )
窃听 Series.dt 在就地操作后缓存无法正确更新的情况 (GH24408 )
Bug in PeriodIndex where comparisons against an array-like object with length 1 failed to raise ValueError (GH23078)
窃听 DatetimeIndex.astype() ， PeriodIndex.astype() 和 TimedeltaIndex.astype() 忽略的符号 dtype 对于无符号整型数据类型 (GH24405 )。
修复了中的错误 Series.max() 使用 datetime64[ns] -dtype无法返回 NaT 当存在空值并且 skipna=False 已通过 (GH24265 )
Bug in to_datetime() where arrays of datetime objects containing both timezone-aware and timezone-naive datetimes would fail to raise ValueError (GH24569)
Bug in to_datetime() with invalid datetime format doesn't coerce input to NaT even if errors='coerce' (GH24763)

Timedelta#

窃听 DataFrame 使用 timedelta64[ns] 数据类型除法依据 Timedelta -类标量错误返回 timedelta64[ns] 数据类型而不是 float64 数据类型 (GH20088 ， GH22163 )
添加一个错误 Index 将对象数据类型设置为 Series 使用 timedelta64[ns] 数据类型错误引发 (GH22390 )
在乘以 Series 将数字数据类型与 timedelta 对象 (GH22390 )
窃听 Series 在添加或减去数组时使用数字数据类型 Series 使用 timedelta64 数据类型 (GH22390 )
Bug in Index with numeric dtype when multiplying or dividing an array with dtype timedelta64 (GH22390)
窃听 TimedeltaIndex 错误地允许使用 Timestamp 对象 (GH20464 )
Fixed bug where subtracting Timedelta from an object-dtyped array would raise TypeError (GH21980)
Fixed bug in adding a DataFrame with all-timedelta64[ns] dtypes to a DataFrame with all-integer dtypes returning incorrect results instead of raising TypeError (GH22696)
Bug in TimedeltaIndex where adding a timezone-aware datetime scalar incorrectly returned a timezone-naive DatetimeIndex (GH23215)
Bug in TimedeltaIndex where adding np.timedelta64('NaT') incorrectly returned an all-NaT DatetimeIndex instead of an all-NaT TimedeltaIndex (GH23215)
窃听 Timedelta 和 to_timedelta() 支持的单位字符串不一致 (GH21762 )
Bug in TimedeltaIndex division where dividing by another TimedeltaIndex raised TypeError instead of returning a Float64Index (GH23829, GH22631)
Bug in TimedeltaIndex comparison operations where comparing against non-Timedelta-like objects would raise TypeError instead of returning all-False for __eq__ and all-True for __ne__ (GH24056)
Bug in Timedelta comparisons when comparing with a Tick object incorrectly raising TypeError (GH24710)

时区#

窃听 Index.shift() 其中一个 AssertionError 在DST之间转换时将提高 (GH8616 )
Bug in Timestamp constructor where passing an invalid timezone offset designator (Z) would not raise a ValueError (GH8910)
窃听 Timestamp.replace() 在DST边界处进行替换会保留不正确的偏移 (GH7825 )
Bug in Series.replace() with datetime64[ns, tz] data when replacing NaT (GH11792)
窃听 Timestamp 当传递带有时区偏移量的不同字符串日期格式时，会产生不同的时区偏移量 (GH12064 )
BUG当比较一个TZ-幼稚 Timestamp 到一个有TZ意识的人 DatetimeIndex 这将迫使 DatetimeIndex 太天真了 (GH12601 )
窃听 Series.truncate() 具有TZ感知的 DatetimeIndex 这将导致核心转储 (GH9243 )
窃听 Series 构造函数，它将强制Tz感知和Tz幼稚 Timestamp 到TZ感知 (GH13051 )
窃听 Index 使用 datetime64[ns, tz] 未正确本地化整型数据的数据类型 (GH20964 )
窃听 DatetimeIndex 如果使用整数构造，且tz不能正确本地化 (GH12619 )
修复了以下错误 DataFrame.describe() 和 Series.describe() 在TZ感知的日期时间上没有显示 first 和 last 结果 (GH21328 )
Bug in DatetimeIndex comparisons failing to raise TypeError when comparing timezone-aware DatetimeIndex against np.datetime64 (GH22074)
窃听 DataFrame 具有时区感知标量的赋值 (GH19843 )
窃听 DataFrame.asof() 这引发了一个 TypeError 当尝试比较Tz-naive和Tz-Aware时间戳时 (GH21194 )
在构造一个 DatetimeIndex 使用 Timestamp 使用 replace 跨DST的方法 (GH18785 )
使用设置新值时出现错误 DataFrame.loc() 使用一个 DatetimeIndex 使用DST过渡 (GH18308 ， GH20724 )
窃听 Index.unique() 未正确重新本地化TZ感知日期 (GH21737 )
编制索引时出现错误 Series 使用DST过渡 (GH21846 )
窃听 DataFrame.resample() 和 Series.resample() 其中一个 AmbiguousTimeError 或 NonExistentTimeError 如果时区感知时间序列在DST转换时结束，则将引发 (GH19375 ， GH10117 )
窃听 DataFrame.drop() 和 Series.drop() 当指定一个支持tz的时间戳键从 DatetimeIndex 使用DST过渡 (GH21761 )
窃听 DatetimeIndex 构造函数WHERE NaT 和 dateutil.tz.tzlocal 会引发一个 OutOfBoundsDatetime 错误 (GH23807 )
窃听 DatetimeIndex.tz_localize() 和 Timestamp.tz_localize() 使用 dateutil.tz.tzlocal 接近返回本地化不正确的日期时间的DST转换 (GH23807 )
窃听 Timestamp 构造函数，其中一个 dateutil.tz.tzutc 时区已通过，但 datetime.datetime 参数将转换为 pytz.UTC 时区 (GH23807 )
Bug in to_datetime() where utc=True was not respected when specifying a unit and errors='ignore' (GH23758)
Bug in to_datetime() where utc=True was not respected when passing a Timestamp (GH24415)
窃听 DataFrame.any() 在以下情况下返回错误的值 axis=1 并且数据是类似日期时间的类型 (GH23070 )
Bug in DatetimeIndex.to_period() where a timezone aware index was converted to UTC first before creating PeriodIndex (GH22905)
窃听 DataFrame.tz_localize() ， DataFrame.tz_convert() ， Series.tz_localize() ，以及 Series.tz_convert() 哪里 copy=False 会在适当的地方改变原来的论点 (GH6326 )
窃听 DataFrame.max() 和 DataFrame.min() 使用 axis=1 其中一个 Series 使用 NaN 当所有列都包含相同时区时，将返回 (GH10390 )

偏移#

窃听 FY5253 其中日期偏移量可能错误地引发 AssertionError 在算术运算中 (GH14774 )
Bug in DateOffset where keyword arguments week and milliseconds were accepted and ignored. Passing these will now raise ValueError (GH19398)
Bug in adding DateOffset with DataFrame or PeriodIndex incorrectly raising TypeError (GH23215)
比较中出现错误 DateOffset 具有非DateOffset对象的对象，尤其是字符串，引发 ValueError 而不是回来 False 对于等价性检查和 True 对于不相等的支票 (GH23524 )

数字#

窃听 Series __rmatmul__ 不支持矩阵向量乘法 (GH21530 )
窃听 factorize() 使用只读阵列失败 (GH12813 )
修复了中的错误 unique() 对带符号零的处理不一致：对于某些输入，0.0和-0.0被视为相等，而对于某些输入，则被视为不同。现在，他们对所有投入都是平等的。 (GH21866 )
窃听 DataFrame.agg() ， DataFrame.transform() 和 DataFrame.apply() 其中，当提供函数列表和 axis=1 (例如 df.apply(['sum', 'mean'], axis=1) )，a TypeError 被错误地抚养长大。对于所有三种方法，这样的计算现在都是正确的。 (GH16679 )。
窃听 Series 与类似DateTime的标量和数组的比较 (GH22074 )
窃听 DataFrame 布尔数据类型与返回整数之间的乘法 object 数据类型而不是整型数据类型 (GH22047 ， GH22163 )
窃听 DataFrame.apply() 其中，当提供字符串参数和其他位置或关键字参数时(例如 df.apply('sum', min_count=1) )，a TypeError 被错误地抚养长大 (GH22376 )
Bug in DataFrame.astype() to extension dtype may raise AttributeError (GH22578)
窃听 DataFrame 使用 timedelta64[ns] Dtype算术运算 ndarray 整数dtype错误地将N数组视为 timedelta64[ns] 数据类型 (GH23114 )
Bug in Series.rpow() with object dtype NaN for 1 ** NA instead of 1 (GH22922).
Series.agg() can now handle numpy NaN-aware methods like numpy.nansum() (GH19629)
窃听 Series.rank() 和 DataFrame.rank() 什么时候 pct=True 和2个以上²⁴ 出现的行的百分比大于1.0 (GH18271 )
呼叫，如 DataFrame.round() 具有非唯一的 CategoricalIndex() 现在返回预期数据。以前，数据会被不正确地复制 (GH21809 )。
Added log10, floor and ceil to the list of supported functions in DataFrame.eval() (GH24139, GH24353)
Logical operations &, |, ^ between Series and Index will no longer raise ValueError (GH22092)
Checking PEP 3141 numbers in is_scalar() function returns True (GH22903)
减少方法，如 Series.sum() 现在接受缺省值 keepdims=False 从NumPy ufunc调用时，而不是引发 TypeError 。全力支持 keepdims 尚未实施 (GH24356 )。

转换#

窃听 DataFrame.combine_first() 其中列类型意外地转换为浮点型 (GH20699 )
窃听 DataFrame.clip() 其中的列类型不会保留并强制转换为浮点型 (GH24162 )
窃听 DataFrame.clip() 当数据帧的列顺序不匹配时，观察到的结果数值错误 (GH20911 )
Bug in DataFrame.astype() where converting to an extension dtype when duplicate column names are present causes a RecursionError (GH24704)

字符串#

窃听 Index.str.partition() 不是安全的吗？ (GH23558 )。
窃听 Index.str.split() 不是安全的吗？ (GH23677 )。
Bug Series.str.contains() not respecting the na argument for a Categorical dtype Series (GH22158)
Bug in Index.str.cat() when the result contained only NaN (GH24044)

间隔#

Bug in the IntervalIndex constructor where the closed parameter did not always override the inferred closed (GH19370)
Bug in the IntervalIndex 在间隔列表后缺少尾随逗号的位置 (GH20611 )
窃听 Interval 其中标量算术运算不保留 closed 价值 (GH22313 )
Bug in IntervalIndex where indexing with datetime-like values raised a KeyError (GH20636)
Bug in IntervalTree where data containing NaN triggered a warning and resulted in incorrect indexing queries with IntervalIndex (GH23352)

标引#

窃听 DataFrame.ne() 如果列包含列名“dtype”，则失败 (GH22383 )
中的回溯 KeyError 问起的时候 .loc 对于单个缺失的标签，现在更短、更清晰 (GH21557 )
PeriodIndex now emits a KeyError when a malformed string is looked up, which is consistent with the behavior of DatetimeIndex (GH22803)
什么时候 .ix 中缺少的整型标签。 MultiIndex 对于整型类型的第一级，它现在引发 KeyError ，与公寓的情况一致 Int64Index ，而不是退回到位置索引 (GH21593 )
Bug in Index.reindex() when reindexing a tz-naive and tz-aware DatetimeIndex (GH8306)
窃听 Series.reindex() 为空序列重新编制索引时， datetime64[ns, tz] 数据类型 (GH20869 )
Bug in DataFrame when setting values with .loc and a timezone aware DatetimeIndex (GH11365)
DataFrame.__getitem__ now accepts dictionaries and dictionary keys as list-likes of labels, consistently with Series.__getitem__ (GH21294)
固定的 DataFrame[np.nan] 当列不唯一时 (GH21428 )
索引时出现错误 DatetimeIndex 具有纳秒分辨率日期和时区 (GH11679 )
使用包含负值的Numpy数组进行索引会使索引器发生变化的错误 (GH21867 )
Bug where mixed indexes wouldn't allow integers for .at (GH19860)
Float64Index.get_loc 现在提高 KeyError 传递布尔键时。 (GH19087 )
Bug in DataFrame.loc() when indexing with an IntervalIndex (GH19977)
Index no longer mangles None, NaN and NaT, i.e. they are treated as three different keys. However, for numeric Index all three are still coerced to a NaN (GH22332)
窃听 scalar in Index 如果标量是浮点型，而 Index 为整型数据类型 (GH22085 )
窃听 MultiIndex.set_levels() 当级别值不可订阅时 (GH23273 )
错误，通过以下方式设置时间增量列 Index 导致其强制转换为双精度，因此失去精度 (GH23511 )
窃听 Index.union() 和 Index.intersection() 其中，名称为 Index 对于某些情况，结果的计算不正确 (GH9943 ， GH9862 )
Bug in Index slicing with boolean Index may raise TypeError (GH22533)
窃听 PeriodArray.__setitem__ 接受切片和类似列表的值时 (GH23978 )
窃听 DatetimeIndex ， TimedeltaIndex 其中，索引使用 Ellipsis 会失去他们的 freq 属性 (GH21282 )
窃听 iat 如果使用它来赋值不兼容的值，则会创建新列 (GH23236 )

丢失#

窃听 DataFrame.fillna() 其中一个 ValueError 当一列包含 datetime64[ns, tz] 数据类型 (GH15522 )
窃听 Series.hasnans() 如果在初始调用后引入空元素，则可能会错误地缓存并返回错误的答案 (GH19700 )
Series.isin() 现在将所有NaN-Float也视为 np.object_ -dtype。此行为与Float64的行为一致 (GH22119 )
unique() 不再损毁Nan-Flow和 NaT -对象为 np.object_ -dtype，即 NaT 不再被强制为NaN值，并被视为不同的实体。 (GH22295 )
DataFrame 和 Series 现在，正确处理带有强化掩码的无用掩码数组。以前，从带有硬掩码的掩码数组构造DataFrame或Series将创建包含基础值的Pandas对象，而不是预期的NaN。 (GH24574 )
窃听 DataFrame 构造函数WHERE dtype 在处理无意义的掩码记录数组时未使用参数。 (GH24874 )

MultiIndex#

Bug in io.formats.style.Styler.applymap() where subset= with MultiIndex slice would reduce to Series (GH19861)
删除了的兼容性 MultiIndex Pickles 0.8.0之前的版本；与 MultiIndex 保留0.13版及以上版本的Pickle (GH21654 )
MultiIndex.get_loc_level() (因此， .loc 在一个 Series 或 DataFrame 使用一个 MultiIndex 索引)现在将引发一个 KeyError ，而不是返回空的 slice ，如果询问存在于 levels 但未使用 (GH22221 )
MultiIndex has gained the MultiIndex.from_frame(), it allows constructing a MultiIndex object from a DataFrame (GH22420)
修复 TypeError 在Python3中创建时 MultiIndex 其中一些级别具有混合类型，例如当一些标签是元组时 (GH15457 )

IO#

窃听 read_csv() 其中，使用 CategoricalDtype 未被正确地从字符串值强制为布尔值 (GH20498 )
窃听 read_csv() 其中，使用Python2.x无法正确识别Unicode列名 (GH13253 )
Bug in DataFrame.to_sql() when writing timezone aware data (datetime64[ns, tz] dtype) would raise a TypeError (GH9086)
窃听 DataFrame.to_sql() 在那里一个天真的人 DatetimeIndex 将被写成 TIMESTAMP WITH TIMEZONE 键入支持的数据库，例如PostgreSQL (GH23510 )
窃听 read_excel() 什么时候 parse_cols 是使用空数据集指定的 (GH9208 )
read_html() 不再忽略全空白 <tr> 在 <thead> 当考虑到 skiprows 和 header 争论。以前，用户必须减少他们的 header 和 skiprows 值来解决此问题。 (GH21641 )
read_excel() will correctly show the deprecation warning for previously deprecated sheetname (GH17994)
read_csv() 和 read_table() 将抛出 UnicodeError 而不是编码错误的字符串上的核心转储 (GH22748 )
read_csv() 将正确解析支持时区的日期时间 (GH22256 )
窃听 read_csv() 其中，当以块为单位读取数据时，针对C引擎过早地优化了内存管理 (GH23509 )
窃听 read_csv() 在提取多索引时错误地标识了未命名列中的 (GH23687 )
read_sas() 将正确解析宽度小于8字节的sas7bdat文件中的数字。 (GH21616 )
read_sas() 将正确解析包含多列的sas7bdat文件 (GH22628 )
read_sas() 将正确解析数据页类型也设置了第7位的sas7bdat文件(因此页类型为128+256=384) (GH16615 )
窃听 read_sas() 其中对无效的文件格式引发了不正确的错误。 (GH24548 )
窃听 detect_client_encoding() 潜力所在 IOError 由于对标准输出的访问受限，在mod_wsgi进程中导入时未处理。 (GH21552 )
窃听 DataFrame.to_html() 使用 index=False 缺少截断指示符(...)关于截断数据帧 (GH15019 ， GH22783 )
Bug in DataFrame.to_html() with index=False when both columns and row index are MultiIndex (GH22579)
窃听 DataFrame.to_html() 使用 index_names=False 显示索引名称 (GH22747 )
窃听 DataFrame.to_html() 使用 header=False 不显示行索引名 (GH23788 )
Bug in DataFrame.to_html() with sparsify=False that caused it to raise TypeError (GH22887)
窃听 DataFrame.to_string() 在下列情况下破坏了列对齐 index=False 并且第一列的值的宽度大于第一列的标题的宽度 (GH16839 ， GH13032 )
窃听 DataFrame.to_string() 这导致了对 DataFrame 为了不占满整个窗户 (GH22984 )
窃听 DataFrame.to_csv() 其中，单级多索引错误地写入了元组。现在只写入索引值 (GH19589 )。
HDFStore 将筹集 ValueError 当 format 将kwarg传递给构造函数 (GH13291 )
窃听 HDFStore.append() 在将 DataFrame 具有空的字符串列，并且 min_itemsize <8 (GH12242 )
窃听 read_csv() 在这种情况下，解析时C引擎中发生内存泄漏 NaN 由于完成或错误时清理不足而导致的值 (GH21353 )
Bug in read_csv() in which incorrect error messages were being raised when skipfooter was passed in along with nrows, iterator, or chunksize (GH23711)
窃听 read_csv() 其中 MultiIndex 在未提供索引名称的情况下，索引名称处理不当 (GH23484 )
窃听 read_csv() 其中，当方言的值与默认参数冲突时，会发出不必要的警告 (GH23761 )
窃听 read_html() 当提供了无效的口味时，错误消息没有显示有效口味 (GH23549 )
窃听 read_excel() 其中提取了无关的标头名称，即使没有指定任何名称 (GH11733 )
窃听 read_excel() 其中的列名有时在Python2.x中无法正确转换为字符串 (GH23874 )
窃听 read_excel() 其中 index_col=None 不受尊重，也不会解析索引列 (GH18792 ， GH20480 )
窃听 read_excel() 其中 usecols 作为字符串传入时，未验证列名是否正确 (GH20480 )
窃听 DataFrame.to_dict() 在数值数据的情况下，结果字典包含非Python标量时 (GH23753 )
DataFrame.to_string() ， DataFrame.to_html() ， DataFrame.to_latex() 属性传递字符串时，将正确格式化输出 float_format 论据 (GH21625 ， GH22270 )
窃听 read_csv() 这导致它抬高了 OverflowError 尝试将“inf”用作 na_value 具有整数索引列 (GH17128 )
窃听 read_csv() 这导致Windows上的Python3.6+上的C引擎无法正确读取带有重音字符或特殊字符的CSV文件名 (GH15086 )
窃听 read_fwf() 其中没有正确推断文件的压缩类型 (GH22199 )
窃听 pandas.io.json.json_normalize() 这导致它抬高了 TypeError 当两个连续的元素 record_path 是Dicts (GH22706 )
窃听 DataFrame.to_stata() ， pandas.io.stata.StataWriter 和 pandas.io.stata.StataWriter117 异常将留下部分写入和无效的DTA文件 (GH23573 )
窃听 DataFrame.to_stata() 和 pandas.io.stata.StataWriter117 这在使用带有非ASCII字符的strl时生成了无效文件 (GH23573 )
窃听 HDFStore 这导致它抬高了 ValueError 从用Python2编写的固定格式读取Python3中的DataFrame时 (GH24510 )
窃听 DataFrame.to_string() 更广泛地说，在漂浮的 repr 格式化程序。如果满足以下条件，则不会删减零 inf 是在一列中出现的，而NA值是这种情况。现在，在NA存在的情况下，零被修剪 (GH24861 )。
Bug in the repr 当截断列数并具有较宽的最后一列时 (GH24849 )。

标绘#

窃听 DataFrame.plot.scatter() 和 DataFrame.plot.hexbin() 导致在IPython内联后端打开Colorbar时x轴标签和记号标签消失 (GH10611 ， GH10678 ，以及 GH20455 )
Bug in plotting a Series with datetimes using matplotlib.axes.Axes.scatter() (GH22039)
窃听 DataFrame.plot.bar() 导致条形图使用多种颜色而不是单一颜色 (GH20585 )
验证颜色参数时出现错误，导致额外的颜色被追加到给定的颜色数组。这发生在使用matplotlib的多个打印函数上。 (GH20726 )

分组依据/重采样/滚动#

窃听 pandas.core.window.Rolling.min() 和 pandas.core.window.Rolling.max() 使用 closed='left' ，一个类似DateTime的索引，并且序列中只有一个条目导致了段错误 (GH24718 )
窃听 pandas.core.groupby.GroupBy.first() 和 pandas.core.groupby.GroupBy.last() 使用 as_index=False 导致时区信息丢失 (GH15884 )
窃听 DateFrame.resample() 跨DST边界进行下采样时 (GH8531 )
日期锚定中的错误 DateFrame.resample() 带偏移量 Day 当n>1时 (GH24127 )
BUG在哪里 ValueError 调用时错误地引发 count() A的方法 SeriesGroupBy 当GROUPING变量仅包含NAN和NumPy版本低于1.13时 (GH21956 )。
存在多个错误 pandas.core.window.Rolling.min() 使用 closed='left' 以及类似DateTime的索引会导致不正确的结果，还会导致段错误。 (GH21704 )
窃听 pandas.core.resample.Resampler.apply() 将位置参数传递给应用函数时 (GH14615 )。
窃听 Series.resample() 当经过的时候 numpy.timedelta64 至 loffset 科瓦格 (GH7687 )。
窃听 pandas.core.resample.Resampler.asfreq() 当频率为 TimedeltaIndex 是一个新频率的子周期 (GH13022 )。
窃听 pandas.core.groupby.SeriesGroupBy.mean() 如果值是整型的，但无法放入int64中，则会发生溢出。 (GH22487 )
pandas.core.groupby.RollingGroupby.agg() 和 pandas.core.groupby.ExpandingGroupby.agg() 现在支持多个聚合函数作为参数 (GH15072 )
窃听 DataFrame.resample() 和 Series.resample() 按每周偏移量重新采样时 ('W' )跨DST过渡 (GH9119 ， GH21459 )
窃听 DataFrame.expanding() 其中 axis 在聚合过程中不尊重参数 (GH23372 )
窃听 pandas.core.groupby.GroupBy.transform() 这会在输入函数可以接受 DataFrame 但将其重新命名 (GH23455 )。
窃听 pandas.core.groupby.GroupBy.nth() 其中列顺序并不总是保持不变 (GH20760 )
Bug in pandas.core.groupby.GroupBy.rank() with method='dense' and pct=True when a group has only one member would raise a ZeroDivisionError (GH23666).
Calling pandas.core.groupby.GroupBy.rank() with empty groups and pct=True was raising a ZeroDivisionError (GH22519)
Bug in DataFrame.resample() when resampling NaT in TimeDeltaIndex (GH13223).
Bug in DataFrame.groupby() did not respect the observed argument when selecting a column and instead always used observed=False (GH23970)
窃听 pandas.core.groupby.SeriesGroupBy.pct_change() 或 pandas.core.groupby.DataFrameGroupBy.pct_change() 以前在计算百分比更改时会跨组工作，现在可以按组正确工作 (GH21200 ， GH21235 )。
阻止创建行数非常大(2^32)的哈希表的错误 (GH22805 )
对分类原因进行分组时，GROUPBY中的错误 ValueError 和不正确的分组，如果 observed=True 和 nan 在分类栏目中显示 (GH24740 ， GH21151 )。

重塑#

窃听 pandas.concat() 将重采样的DataFrames与时区感知索引联接时 (GH13783 )
窃听 pandas.concat() 仅当加入时 Series 这个 names 论证 concat 不再被忽略 (GH23490 )
窃听 Series.combine_first() 使用 datetime64[ns, tz] 将返回纯文本结果的dtype (GH21469 )
窃听 Series.where() 和 DataFrame.where() 使用 datetime64[ns, tz] 数据类型 (GH21546 )
窃听 DataFrame.where() 具有空的DataFrame和空的 cond 具有非布尔数据类型 (GH21947 )
窃听 Series.mask() 和 DataFrame.mask() 使用 list 条件句 (GH21891 )
Bug in DataFrame.replace() raises RecursionError when converting OutOfBounds datetime64[ns, tz] (GH20380)
pandas.core.groupby.GroupBy.rank() now raises a ValueError when an invalid value is passed for argument na_option (GH22124)
窃听 get_dummies() 在Python2中使用Unicode属性 (GH22084 )
窃听 DataFrame.replace() 加薪 RecursionError 替换空列表时 (GH22083 )
窃听 Series.replace() 和 DataFrame.replace() 当将dict用作 to_replace 值，且DICT中的一个键是另一个键的值，使用整数键和使用字符串键的结果不一致 (GH20656 )
窃听 DataFrame.drop_duplicates() 对于空的 DataFrame 这会错误地引发错误 (GH20516 )
窃听 pandas.wide_to_long() 当字符串传递给存根名称参数并且列名是该存根名称的子字符串时 (GH22468 )
窃听 merge() 合并时 datetime64[ns, tz] 包含DST转换的数据 (GH18885 )
窃听 merge_asof() 在定义的容差内合并浮点值时 (GH22981 )
窃听 pandas.concat() 将具有TZ感知数据的多列DataFrame与具有不同列数的DataFrame连接时 (GH22796 )
窃听 merge_asof() 尝试与缺少的值合并时，出现令人困惑的错误消息 (GH23189 )
窃听 DataFrame.nsmallest() 和 DataFrame.nlargest() 对于具有 MultiIndex 对于列 (GH23033 )。
Bug in pandas.melt() when passing column names that are not present in DataFrame (GH23575)
Bug in DataFrame.append() with a Series with a dateutil timezone would raise a TypeError (GH23682)
Bug in Series construction when passing no data and dtype=str (GH22477)
Bug in cut() with bins as an overlapping IntervalIndex where multiple bins were returned per item instead of raising a ValueError (GH23980)
窃听 pandas.concat() 加入时 Series 与……约会 Series 类别将丢失时区 (GH23816 )
窃听 DataFrame.join() 在部分多重索引上连接时，将删除名称 (GH20452 )。
DataFrame.nlargest() 和 DataFrame.nsmallest() 现在在KEEP！=‘ALL’时返回正确的n值 (GH22752 )
使用索引参数构造DataFrame，该参数已不是 Index 被弄坏了 (GH22227 )。
窃听 DataFrame 防止列表子类用于构造 (GH21226 )
窃听 DataFrame.unstack() 和 DataFrame.pivot_table() 当生成的DataFrame包含的元素超过int32可以处理的数量时，返回误导性的错误消息。现在，错误消息已得到改进，指向实际问题 (GH20601 )
窃听 DataFrame.unstack() 其中一个 ValueError 在取消堆叠时区感知值时引发 (GH18338 )
窃听 DataFrame.stack() 将时区感知值转换为时区朴素值的位置 (GH19420 )
窃听 merge_asof() 其中一个 TypeError 是在什么时候提出的 by_col 是否有时区感知的值 (GH21184 )
过程中抛出错误时显示错误形状的错误 DataFrame 建筑。 (GH20742 )

稀疏#

现在可以将布尔值、DATETIME或TimeDelta列更新为稀疏 (GH22367 )
窃听 Series.to_sparse() 已保存稀疏数据的级数不能正确构造 (GH22389 )
提供一个 sparse_index 不再将NA值默认为 np.nan 适用于所有数据类型。的正确NA_值 data.dtype 现在被使用了。
窃听 SparseArray.nbytes 通过不包括其稀疏索引的大小来少报其内存使用量。
改进的性能 Series.shift() 对于非NA fill_value ，因为值不再转换为密集数组。
窃听 DataFrame.groupby 不包括 fill_value 在非NA组中 fill_value 按稀疏列分组时 (GH5078 )
一元求逆算子中的错误 (~ )上 SparseSeries 使用布尔值。它的性能也得到了提高 (GH22835 )
窃听 SparseArary.unique() 不返回唯一值 (GH19595 )
窃听 SparseArray.nonzero() 和 SparseDataFrame.dropna() 返回移位/错误结果 (GH21172 )
窃听 DataFrame.apply() 其中数据类型将失去稀疏性 (GH23744 )
窃听 concat() 连接列表时 Series 使用全稀疏值更改 fill_value 并转换为密集系列 (GH24371 )

风格#

background_gradient() 现在需要一个 text_color_threshold 参数可根据背景色的亮度自动使文本颜色变亮。这提高了深色背景的可读性，而不需要限制背景色表范围。 (GH21258 )
background_gradient() now also supports tablewise application (in addition to rowwise and columnwise) with axis=None (GH15204)
bar() now also supports tablewise application (in addition to rowwise and columnwise) with axis=None and setting clipping range with vmin and vmax (GH21548 and GH21526). NaN values are also handled properly.

构建更改#

Building pandas for development now requires cython >= 0.28.2 (GH21688)
测试Pandas现在需要 hypothesis>=3.58 。你可以找到 the Hypothesis docs here ，和Pandas专属的介绍 in the contributing guide 。 (GH22280 )
在MacOS上构建Pandas现在的目标是，如果在MacOS 10.9或更高版本上运行，则最低版本为MacOS 10.9 (GH23424 )

其他#

C变量声明带有外部链接的错误，如果在PANDA之前导入某些其他C库，则会导致导入错误。 (GH24113 )

贡献者#

共有337人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。

AJ Dyka +
AJ Pryor, Ph.D +
Aaron Critchley
Adam Hooper
Adam J. Stewart
Adam Kim
Adam Klimont +
Addison Lynch +
Alan Hogue +
Alex Radu +
Alex Rychyk
Alex Strick van Linschoten +
Alex Volkov +
Alexander Buchkovsky
Alexander Hess +
Alexander Ponomaroff +
Allison Browne +
Aly Sivji
Andrew
Andrew Gross +
Andrew Spott +
Andy +
Aniket uttam +
Anjali2019 +
Anjana S +
Antti Kaihola +
Anudeep Tubati +
Arjun Sharma +
Armin Varshokar
Artem Bogachev
ArtinSarraf +
Barry Fitzgerald +
Bart Aelterman +
Ben James +
Ben Nelson +
Benjamin Grove +
Benjamin Rowell +
Benoit Paquet +
Boris Lau +
Brett Naul
Brian Choi +
C.A.M. Gerlach +
Carl Johan +
Chalmer Lowe
Chang She
Charles David +
Cheuk Ting Ho
Chris
Chris Roberts +
Christopher Whelan
Chu Qing Hao +
Da Cheezy Mobsta +
Damini Satya
Daniel Himmelstein
Daniel Saxton +
Darcy Meyer +
DataOmbudsman
David Arcos
David Krych
Dean Langsam +
Diego Argueta +
Diego Torres +
Dobatymo +
Doug Latornell +
Dr. Irv
Dylan Dmitri Gray +
Eric Boxer +
Eric Chea
Erik +
Erik Nilsson +
Fabian Haase +
Fabian Retkowski
Fabien Aulaire +
Fakabbir Amin +
Fei Phoon +
Fernando Margueirat +
Florian Müller +
Fábio Rosado +
Gabe Fernando
Gabriel Reid +
Giftlin Rajaiah
Gioia Ballin +
Gjelt
Gosuke Shibahara +
Graham Inggs
Guillaume Gay
Guillaume Lemaitre +
Hannah Ferchland
Haochen Wu
Hubert +
HubertKl +
HyunTruth +
Iain Barr
Ignacio Vergara Kausel +
Irv Lustig +
IsvenC +
Jacopo Rota
Jakob Jarmar +
James Bourbeau +
James Myatt +
James Winegar +
Jan Rudolph
Jared Groves +
Jason Kiley +
Javad Noorbakhsh +
Jay Offerdahl +
Jeff Reback
Jeongmin Yu +
Jeremy Schendel
Jerod Estapa +
Jesper Dramsch +
Jim Jeon +
Joe Jevnik
Joel Nothman
Joel Ostblom +
Jordi Contestí
Jorge López Fueyo +
Joris Van den Bossche
Jose Quinones +
Jose Rivera-Rubio +
Josh
Jun +
Justin Zheng +
Kaiqi Dong +
Kalyan Gokhale
Kang Yoosam +
Karl Dunkle Werner +
Karmanya Aggarwal +
Kevin Markham +
Kevin Sheppard
Kimi Li +
Koustav Samaddar +
Krishna +
Kristian Holsheimer +
Ksenia Gueletina +
Kyle Prestel +
LJ +
LeakedMemory +
Li Jin +
Licht Takeuchi
Luca Donini +
Luciano Viola +
Mak Sze Chun +
Marc Garcia
Marius Potgieter +
Mark Sikora +
Markus Meier +
Marlene Silva Marchena +
Martin Babka +
MatanCohe +
Mateusz Woś +
Mathew Topper +
Matt Boggess +
Matt Cooper +
Matt Williams +
Matthew Gilbert
Matthew Roeschke
Max Kanter
Michael Odintsov
Michael Silverstein +
Michael-J-Ward +
Mickaël Schoentgen +
Miguel Sánchez de León Peque +
Ming Li
Mitar
Mitch Negus
Monson Shao +
Moonsoo Kim +
Mortada Mehyar
Myles Braithwaite
Nehil Jain +
Nicholas Musolino +
Nicolas Dickreuter +
Nikhil Kumar Mengani +
Nikoleta Glynatsi +
Ondrej Kokes
Pablo Ambrosio +
Pamela Wu +
Parfait G +
Patrick Park +
Paul
Paul Ganssle
Paul Reidy
Paul van Mulbregt +
Phillip Cloud
Pietro Battiston
Piyush Aggarwal +
Prabakaran Kumaresshan +
Pulkit Maloo
Pyry Kovanen
Rajib Mitra +
Redonnet Louis +
Rhys Parry +
Rick +
Robin
Roei.r +
RomainSa +
Roman Imankulov +
Roman Yurchak +
Ruijing Li +
Ryan +
Ryan Nazareth +
Rüdiger Busche +
SEUNG HOON, SHIN +
Sandrine Pataut +
Sangwoong Yoon
Santosh Kumar +
Saurav Chakravorty +
Scott McAllister +
Sean Chan +
Shadi Akiki +
Shengpu Tang +
Shirish Kadam +
Simon Hawkins +
Simon Riddell +
Simone Basso
Sinhrks
Soyoun(Rose) Kim +
Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి) +
Stefaan Lippens +
Stefano Cianciulli
Stefano Miccoli +
Stephen Childs
Stephen Pascoe
Steve Baker +
Steve Cook +
Steve Dower +
Stéphan Taljaard +
Sumin Byeon +
Sören +
Tamas Nagy +
Tanya Jain +
Tarbo Fukazawa
Thein Oo +
Thiago Cordeiro da Fonseca +
Thierry Moisan
Thiviyan Thanapalasingam +
Thomas Lentali +
Tim D. Smith +
Tim Swast
Tom Augspurger
Tomasz Kluczkowski +
Tony Tao +
Triple0 +
Troels Nielsen +
Tuhin Mahmud +
Tyler Reddy +
Uddeshya Singh
Uwe L. Korn +
Vadym Barda +
Varad Gunjal +
Victor Maryama +
Victor Villas
Vincent La
Vitória Helena +
Vu Le
Vyom Jain +
Weiwen Gu +
Wenhuan
Wes Turner
Wil Tan +
William Ayd
Yeojin Kim +
Yitzhak Andrade +
Yuecheng Wu +
Yuliya Dovzhenko +
Yury Bayda +
Zac Hatfield-Dodds +
aberres +
aeltanawy +
ailchau +
alimcmaster1
alphaCTzo7G +
amphy +
araraonline +
azure-pipelines[bot] +
benarthur91 +
bk521234 +
cgangwar11 +
chris-b1
cxl923cc +
dahlbaek +
dannyhyunkim +
darke-spirits +
david-liu-brattle-1
davidmvalente +
deflatSOCO
doosik_bae +
dylanchase +
eduardo naufel schettino +
euri10 +
evangelineliu +
fengyqf +
fjdiod
fl4p +
fleimgruber +
gfyoung
h-vetinari
harisbal +
henriqueribeiro +
himanshu awasthi
hongshaoyang +
igorfassen +
jalazbe +
jbrockmendel
jh-wu +
justinchan23 +
louispotok
marcosrullan +
miker985
nicolab100 +
nprad
nsuresh +
ottiP
pajachiet +
raguiar2 +
ratijas +
realead +
robbuckley +
saurav2608 +
sideeye +
ssikdar1
svenharris +
syutbai +
testvinder +
thatneat
tmnhat2001
tomascassidy +
tomneep
topper-123
vkk800 +
winlu +
ym-pett +
yrhooke +
ywpark1 +
zertrin
zhezherun +

0.24.1中的新特性(2019年2月3日)

0.23.4中的新特性(2018年8月3日)

0.24.0中的新特性(2019年1月25日)#

增强#

可选的整型NA支持#

访问序列或索引中的值#

pandas.array ：创建数组的新顶层方法#

在Series和DataFrame中存储间隔和周期数据#

使用两个多索引连接#

功能 read_html 增强功能#

新的 Styler.pipe() 方法#

重命名多索引中的名称#

其他增强功能#

向后不兼容的API更改#

提高了依赖项的最低版本#

os.linesep is used for line_terminator of DataFrame.to_csv#

妥善处理 np.NaN 在具有Python引擎的字符串数据类型的列中#

解析带有时区偏移量的日期时间字符串#

使用以下命令解析混合时区 read_csv()#

时间值，单位： dt.end_time 和 to_timestamp(how='end')#

系列。对于支持时区的数据是唯一的#

稀疏数据结构重构#

get_dummies() 始终返回DataFrame#

在中引发ValueError DataFrame.to_dict(orient='index')#

勾选日期偏移规格化限制#

周期减法#

加/减 NaN 从… DataFrame#

广播更改的DataFrame比较操作#

DataFrame算术运算广播更改#

系列和索引数据-数据类型不兼容#

串联更改#

类似DateTimeliAPI的更改#

其他API更改#

扩展类型更改#

不推荐使用#

不建议使用带日期时间和时间增量的整数加/减#

将整型数据和时区传递给DatetimeIndex#

将时区感知系列和索引转换为NumPy数组#

删除先前版本的弃用/更改#

性能改进#

错误修复#

直截了当的#

类似日期的#

Timedelta#

时区#

偏移#

数字#

转换#

字符串#

间隔#

标引#

丢失#

MultiIndex#

IO#

标绘#

分组依据/重采样/滚动#

重塑#

稀疏#

风格#

构建更改#

其他#

贡献者#

`pandas.array` ：创建数组的新顶层方法#

功能 `read_html` 增强功能#

新的 `Styler.pipe()` 方法#

`os.linesep` is used for `line_terminator` of `DataFrame.to_csv`#

妥善处理 `np.NaN` 在具有Python引擎的字符串数据类型的列中#

使用以下命令解析混合时区 `read_csv()`#

时间值，单位： `dt.end_time` 和 `to_timestamp(how='end')`#

`get_dummies()` 始终返回DataFrame#

在中引发ValueError `DataFrame.to_dict(orient='index')`#

加/减 `NaN` 从… `DataFrame`#