版本0.13.0(2014年1月3日)#
这是从0.12.0开始的一个主要版本,包括许多API更改、几个新功能和增强功能以及大量的错误修复。
亮点包括:
支持新的索引类型
Float64Index
和其他索引增强功能HDFStore
有一种新的基于字符串的查询规范语法支持新的插补方法
已更新
timedelta
运营一种新的字符串操作方法
extract
对偏移量的纳秒支持
isin
对于DataFrames
添加了几个试验性功能,包括:
新的
eval/query
表达式求值的方法支持
msgpack
序列化谷歌的I/O接口
BigQuery
它们是几个新的或更新的文档部分,包括:
Comparison with SQL ,这对那些熟悉SQL但仍在学习Pandas的人应该很有用。
Comparison with R ,从R到Pandas的习语翻译。
Enhancing Performance ,通过以下方式提高Pandas的表现
eval/query
。
警告
在0.13.0中 Series
已在内部重构为不再是子类 ndarray
取而代之的是子类 NDFrame
,与其他Pandas容器相似。这应该是一个透明的更改,只会对API造成非常有限的影响。看见 Internal Refactoring
API更改#
read_excel
现在支持在其sheetname
提供要读入的工作表的索引的参数 (GH4301 )。文本解析器现在可以处理任何类似inf(“inf”、“inf”、“-inf”、“inf”等)的内容。无穷无尽。 (GH4220 , GH4219 ),影响
read_table
,read_csv
等。pandas
多亏了@jtratner,现在可以兼容Python2/3,而不需要2to3。因此,Pandas现在更广泛地使用迭代器。这也导致了本杰明·彼得森的six
将类库转换为计算机。 (GH4384 , GH4375 , GH4372 )pandas.util.compat
和pandas.util.py3compat
已经被合并到pandas.compat
。pandas.compat
现在包括许多允许2/3兼容性的功能。它包含Range、Filter、map和Zip的列表和迭代器版本,以及与Python3兼容的其他必要元素。lmap
,lzip
,lrange
和lfilter
它们都生成列表,而不是迭代器,以与numpy
、下标和pandas
构造函数。 (GH4384 , GH4375 , GH4372 )Series.get
with negative indexers now returns the same as[]
(GH4390)更改方式
Index
和MultiIndex
处理元数据 (levels
,labels
,以及names
) (GH4039 ):# previously, you would have set levels or labels directly >>> pd.index.levels = [[1, 2, 3, 4], [1, 2, 4, 4]] # now, you use the set_levels or set_labels methods >>> index = pd.index.set_levels([[1, 2, 3, 4], [1, 2, 4, 4]]) # similarly, for names, you can rename the object # but setting names is not deprecated >>> index = pd.index.set_names(["bob", "cranberry"]) # and all methods take an inplace kwarg - but return None >>> pd.index.set_names(["bob", "cranberry"], inplace=True)
All 除法与
NDFrame
对象现在是 真正的划分 ,而不考虑未来的进口。这意味着默认情况下,对Pandas对象的操作将使用 浮点数 除法,并返回浮点数据类型。您可以使用//
和floordiv
来做整数除法。整数除法
In [3]: arr = np.array([1, 2, 3, 4]) In [4]: arr2 = np.array([5, 3, 2, 1]) In [5]: arr / arr2 Out[5]: array([0, 0, 1, 4]) In [6]: pd.Series(arr) // pd.Series(arr2) Out[6]: 0 0 1 0 2 1 3 4 dtype: int64
真正的划分
In [7]: pd.Series(arr) / pd.Series(arr2) # no future import required Out[7]: 0 0.200000 1 0.666667 2 1.500000 3 4.000000 dtype: float64
Infer and downcast dtype if
downcast='infer'
is passed tofillna/ffill/bfill
(GH4604)__nonzero__
对于所有NDFrame对象,现在将引发ValueError
,这将恢复到 (GH1073 , GH4633 )行为。看见 gotchas 以进行更详细的讨论。这会阻止对 全 Pandas对象,这在本质上是模棱两可的。所有这些都将引发一个
ValueError
。>>> df = pd.DataFrame({'A': np.random.randn(10), ... 'B': np.random.randn(10), ... 'C': pd.date_range('20130101', periods=10) ... }) ... >>> if df: ... pass ... Traceback (most recent call last): ... ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). >>> df1 = df >>> df2 = df >>> df1 and df2 Traceback (most recent call last): ... ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). >>> d = [1, 2, 3] >>> s1 = pd.Series(d) >>> s2 = pd.Series(d) >>> s1 and s2 Traceback (most recent call last): ... ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
添加了
.bool()
方法来执行以下操作NDFrame
对象以便于计算单元素布尔级数:In [1]: pd.Series([True]).bool() Out[1]: True In [2]: pd.Series([False]).bool() Out[2]: False In [3]: pd.DataFrame([[True]]).bool() Out[3]: True In [4]: pd.DataFrame([[False]]).bool() Out[4]: False
所有非索引NDFrame (
Series
,DataFrame
,Panel
,Panel4D
,SparsePanel
等),现在支持整套算术运算符和算术灵活方法(加、减、乘等)。SparsePanel
不支持pow
或mod
使用非标量。 (GH3765 )Series
和DataFrame
现在有一个mode()
按AXIS/Series计算统计模式的方法。 (GH5367 )链式分配现在默认情况下会在用户分配给副本时发出警告。可以使用选项更改此设置
mode.chained_assignment
,允许的选项包括raise/warn/None
。看见 the docs 。In [5]: dfc = pd.DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]}) In [6]: pd.set_option('chained_assignment', 'warn')
如果尝试此操作,将显示以下警告/异常。
In [7]: dfc.loc[0]['A'] = 1111
Traceback (most recent call last) ... SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
以下是正确的作业方法。
In [8]: dfc.loc[0, 'A'] = 11 In [9]: dfc Out[9]: A B 0 11 1 1 bbb 2 2 ccc 3
Panel.reindex
has the following call signaturePanel.reindex(items=None, major_axis=None, minor_axis=None, **kwargs)
为了符合其他标准
NDFrame
物体。看见 Internal Refactoring 了解更多信息。
Series.argmin
和Series.argmax
现在别名为Series.idxmin
和Series.idxmax
。这些函数返回 索引 的分别为最小元素或最大元素。在0.13.0之前,这些参数将返回最小/最大元素的位置。 (GH6214 )
先前版本的弃用/更改#
这些是在0.12或更早版本中宣布的更改,自0.13.0起生效
Remove deprecated
Factor
(GH3650)Remove deprecated
set_printoptions/reset_printoptions
(GH3046)Remove deprecated
_verbose_info
(GH3215)Remove deprecated
read_clipboard/to_clipboard/ExcelFile/ExcelWriter
frompandas.io.parsers
(GH3717) These are available as functions in the main pandas namespace (e.g.pd.read_clipboard
)默认设置为
tupleize_cols
现在是False
对这两个人都是to_csv
和read_csv
。0.12中的公平警告 (GH3604 )默认设置为
display.max_seq_len
现在是100,而不是None
。这将激活截断显示(“...”)不同地方的长序列。 (GH3391 )
不推荐使用#
0.13.0中已弃用
索引API更改#
在0.13之前,无法使用标签索引器 (.loc/.ix
)设置未包含在特定轴的索引中的值。 (GH2578 )。看见 the docs
在 Series
如果这实际上是一个追加操作
In [10]: s = pd.Series([1, 2, 3])
In [11]: s
Out[11]:
0 1
1 2
2 3
dtype: int64
In [12]: s[5] = 5.
In [13]: s
Out[13]:
0 1.0
1 2.0
2 3.0
5 5.0
dtype: float64
In [14]: dfi = pd.DataFrame(np.arange(6).reshape(3, 2),
....: columns=['A', 'B'])
....:
In [15]: dfi
Out[15]:
A B
0 0 1
1 2 3
2 4 5
这将在以前 KeyError
In [16]: dfi.loc[:, 'C'] = dfi.loc[:, 'A']
In [17]: dfi
Out[17]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
这就像是一个 append
手术开始了。
In [18]: dfi.loc[3] = 5
In [19]: dfi
Out[19]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
3 5 5 5
任意轴上的面板设置操作会将输入与面板对齐
In [20]: p = pd.Panel(np.arange(16).reshape(2, 4, 2),
....: items=['Item1', 'Item2'],
....: major_axis=pd.date_range('2001/1/12', periods=4),
....: minor_axis=['A', 'B'], dtype='float64')
....:
In [21]: p
Out[21]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 2 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2001-01-12 00:00:00 to 2001-01-15 00:00:00
Minor_axis axis: A to B
In [22]: p.loc[:, :, 'C'] = pd.Series([30, 32], index=p.items)
In [23]: p
Out[23]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis)
Items axis: Item1 to Item2
Major_axis axis: 2001-01-12 00:00:00 to 2001-01-15 00:00:00
Minor_axis axis: A to C
In [24]: p.loc[:, :, 'C']
Out[24]:
Item1 Item2
2001-01-12 30.0 32.0
2001-01-13 30.0 32.0
2001-01-14 30.0 32.0
2001-01-15 30.0 32.0
Float64Index API更改#
添加了新的索引类型,
Float64Index
。这将在索引创建过程中传递浮点值时自动创建。这实现了纯基于标签的切片范例,从而使[],ix,loc
标量索引和切片的工作原理完全相同。看见 the docs , (GH263 )构造默认情况下用于浮点类型值。
In [20]: index = pd.Index([1.5, 2, 3, 4.5, 5]) In [21]: index Out[21]: Float64Index([1.5, 2.0, 3.0, 4.5, 5.0], dtype='float64') In [22]: s = pd.Series(range(5), index=index) In [23]: s Out[23]: 1.5 0 2.0 1 3.0 2 4.5 3 5.0 4 dtype: int64
标量选择
[],.ix,.loc
将始终基于标签。整数将匹配相等的浮点索引(例如3
相当于3.0
)In [24]: s[3] Out[24]: 2 In [25]: s.loc[3] Out[25]: 2
唯一的位置索引是VIA
iloc
In [26]: s.iloc[3] Out[26]: 3
未找到的标量索引将引发
KeyError
切片始终位于索引值上,对于
[],ix,loc
并且总是定位于iloc
In [27]: s[2:4] Out[27]: 2.0 1 3.0 2 dtype: int64 In [28]: s.loc[2:4] Out[28]: 2.0 1 3.0 2 dtype: int64 In [29]: s.iloc[2:4] Out[29]: 3.0 2 4.5 3 dtype: int64
在浮点型索引中,允许使用浮点型进行切片
In [30]: s[2.1:4.6] Out[30]: 3.0 2 4.5 3 dtype: int64 In [31]: s.loc[2.1:4.6] Out[31]: 3.0 2 4.5 3 dtype: int64
保留对其他索引类型的索引(以及位置回退
[],ix
),唯一例外的是对非Float64Index
现在将引发一个TypeError
。In [1]: pd.Series(range(5))[3.5] TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index) In [1]: pd.Series(range(5))[3.5:4.5] TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
在未来的版本中,不建议使用标量浮点索引器,但目前允许使用。
In [3]: pd.Series(range(5))[3.0] Out[3]: 3
HDFStore API更改#
查询格式更改。现在支持更像字符串的查询格式。看见 the docs 。
In [32]: path = 'test.h5' In [33]: dfq = pd.DataFrame(np.random.randn(10, 4), ....: columns=list('ABCD'), ....: index=pd.date_range('20130101', periods=10)) ....: In [34]: dfq.to_hdf(path, 'dfq', format='table', data_columns=True) --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version) 138 try: --> 139 module = importlib.import_module(name) 140 except ImportError: File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package) 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'tables' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [34], in <cell line: 1>() ----> 1 dfq.to_hdf(path, 'dfq', format='table', data_columns=True) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 2651 from pandas.io import pytables 2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected 2654 # "Union[DataFrame, Series]" [arg-type] -> 2655 pytables.to_hdf( 2656 path_or_buf, 2657 key, 2658 self, # type: ignore[arg-type] 2659 mode=mode, 2660 complevel=complevel, 2661 complib=complib, 2662 append=append, 2663 format=format, 2664 index=index, 2665 min_itemsize=min_itemsize, 2666 nan_rep=nan_rep, 2667 dropna=dropna, 2668 data_columns=data_columns, 2669 errors=errors, 2670 encoding=encoding, 2671 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 310 path_or_buf = stringify_path(path_or_buf) 311 if isinstance(path_or_buf, str): --> 312 with HDFStore( 313 path_or_buf, mode=mode, complevel=complevel, complib=complib 314 ) as store: 315 f(store) 316 else: File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs) 570 if "format" in kwargs: 571 raise ValueError("format is not a defined argument for HDFStore") --> 573 tables = import_optional_dependency("tables") 575 if complib is not None and complib not in tables.filters.all_complibs: 576 raise ValueError( 577 f"complib only supports {tables.filters.all_complibs} compression." 578 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 140 except ImportError: 141 if errors == "raise": --> 142 raise ImportError(msg) 143 else: 144 return None ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables.
使用布尔表达式,并进行内联函数求值。
In [35]: pd.read_hdf(path, 'dfq', ....: where="index>Timestamp('20130104') & columns=['A', 'B']") ....: --------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Input In [35], in <cell line: 1>() ----> 1 pd.read_hdf(path, 'dfq', 2 where="index>Timestamp('20130104') & columns=['A', 'B']") File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:428, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs) 425 exists = False 427 if not exists: --> 428 raise FileNotFoundError(f"File {path_or_buf} does not exist") 430 store = HDFStore(path_or_buf, mode=mode, errors=errors, **kwargs) 431 # can't auto open/close if we are using an iterator 432 # so delegate to the iterator FileNotFoundError: File test.h5 does not exist
使用内联列引用
In [36]: pd.read_hdf(path, 'dfq', ....: where="A>0 or C>0") ....: --------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Input In [36], in <cell line: 1>() ----> 1 pd.read_hdf(path, 'dfq', 2 where="A>0 or C>0") File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:428, in read_hdf(path_or_buf, key, mode, errors, where, start, stop, columns, iterator, chunksize, **kwargs) 425 exists = False 427 if not exists: --> 428 raise FileNotFoundError(f"File {path_or_buf} does not exist") 430 store = HDFStore(path_or_buf, mode=mode, errors=errors, **kwargs) 431 # can't auto open/close if we are using an iterator 432 # so delegate to the iterator FileNotFoundError: File test.h5 does not exist
这个
format
关键字现在取代了table
关键字;允许值为fixed(f)
或table(t)
与之前<0.13.0相同的缺省值保持不变,例如put
暗示fixed
格式化和append
暗示table
格式化。通过设置,可以将此默认格式设置为选项io.hdf.default_format
。In [37]: path = 'test.h5' In [38]: df = pd.DataFrame(np.random.randn(10, 2)) In [39]: df.to_hdf(path, 'df_table', format='table') --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version) 138 try: --> 139 module = importlib.import_module(name) 140 except ImportError: File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package) 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'tables' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [39], in <cell line: 1>() ----> 1 df.to_hdf(path, 'df_table', format='table') File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 2651 from pandas.io import pytables 2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected 2654 # "Union[DataFrame, Series]" [arg-type] -> 2655 pytables.to_hdf( 2656 path_or_buf, 2657 key, 2658 self, # type: ignore[arg-type] 2659 mode=mode, 2660 complevel=complevel, 2661 complib=complib, 2662 append=append, 2663 format=format, 2664 index=index, 2665 min_itemsize=min_itemsize, 2666 nan_rep=nan_rep, 2667 dropna=dropna, 2668 data_columns=data_columns, 2669 errors=errors, 2670 encoding=encoding, 2671 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 310 path_or_buf = stringify_path(path_or_buf) 311 if isinstance(path_or_buf, str): --> 312 with HDFStore( 313 path_or_buf, mode=mode, complevel=complevel, complib=complib 314 ) as store: 315 f(store) 316 else: File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs) 570 if "format" in kwargs: 571 raise ValueError("format is not a defined argument for HDFStore") --> 573 tables = import_optional_dependency("tables") 575 if complib is not None and complib not in tables.filters.all_complibs: 576 raise ValueError( 577 f"complib only supports {tables.filters.all_complibs} compression." 578 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 140 except ImportError: 141 if errors == "raise": --> 142 raise ImportError(msg) 143 else: 144 return None ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables. In [40]: df.to_hdf(path, 'df_table2', append=True) --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version) 138 try: --> 139 module = importlib.import_module(name) 140 except ImportError: File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package) 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'tables' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [40], in <cell line: 1>() ----> 1 df.to_hdf(path, 'df_table2', append=True) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 2651 from pandas.io import pytables 2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected 2654 # "Union[DataFrame, Series]" [arg-type] -> 2655 pytables.to_hdf( 2656 path_or_buf, 2657 key, 2658 self, # type: ignore[arg-type] 2659 mode=mode, 2660 complevel=complevel, 2661 complib=complib, 2662 append=append, 2663 format=format, 2664 index=index, 2665 min_itemsize=min_itemsize, 2666 nan_rep=nan_rep, 2667 dropna=dropna, 2668 data_columns=data_columns, 2669 errors=errors, 2670 encoding=encoding, 2671 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 310 path_or_buf = stringify_path(path_or_buf) 311 if isinstance(path_or_buf, str): --> 312 with HDFStore( 313 path_or_buf, mode=mode, complevel=complevel, complib=complib 314 ) as store: 315 f(store) 316 else: File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs) 570 if "format" in kwargs: 571 raise ValueError("format is not a defined argument for HDFStore") --> 573 tables = import_optional_dependency("tables") 575 if complib is not None and complib not in tables.filters.all_complibs: 576 raise ValueError( 577 f"complib only supports {tables.filters.all_complibs} compression." 578 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 140 except ImportError: 141 if errors == "raise": --> 142 raise ImportError(msg) 143 else: 144 return None ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables. In [41]: df.to_hdf(path, 'df_fixed') --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version) 138 try: --> 139 module = importlib.import_module(name) 140 except ImportError: File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package) 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'tables' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [41], in <cell line: 1>() ----> 1 df.to_hdf(path, 'df_fixed') File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/core/generic.py:2655, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 2651 from pandas.io import pytables 2653 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected 2654 # "Union[DataFrame, Series]" [arg-type] -> 2655 pytables.to_hdf( 2656 path_or_buf, 2657 key, 2658 self, # type: ignore[arg-type] 2659 mode=mode, 2660 complevel=complevel, 2661 complib=complib, 2662 append=append, 2663 format=format, 2664 index=index, 2665 min_itemsize=min_itemsize, 2666 nan_rep=nan_rep, 2667 dropna=dropna, 2668 data_columns=data_columns, 2669 errors=errors, 2670 encoding=encoding, 2671 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:312, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 310 path_or_buf = stringify_path(path_or_buf) 311 if isinstance(path_or_buf, str): --> 312 with HDFStore( 313 path_or_buf, mode=mode, complevel=complevel, complib=complib 314 ) as store: 315 f(store) 316 else: File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs) 570 if "format" in kwargs: 571 raise ValueError("format is not a defined argument for HDFStore") --> 573 tables = import_optional_dependency("tables") 575 if complib is not None and complib not in tables.filters.all_complibs: 576 raise ValueError( 577 f"complib only supports {tables.filters.all_complibs} compression." 578 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 140 except ImportError: 141 if errors == "raise": --> 142 raise ImportError(msg) 143 else: 144 return None ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables. In [42]: with pd.HDFStore(path) as store: ....: print(store) ....: --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version) 138 try: --> 139 module = importlib.import_module(name) 140 except ImportError: File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package) 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'tables' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [42], in <cell line: 1>() ----> 1 with pd.HDFStore(path) as store: 2 print(store) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs) 570 if "format" in kwargs: 571 raise ValueError("format is not a defined argument for HDFStore") --> 573 tables = import_optional_dependency("tables") 575 if complib is not None and complib not in tables.filters.all_complibs: 576 raise ValueError( 577 f"complib only supports {tables.filters.all_complibs} compression." 578 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 140 except ImportError: 141 if errors == "raise": --> 142 raise ImportError(msg) 143 else: 144 return None ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables.
显著提高了表写入性能
处理通过的
Series
以表格形式显示 (GH4330 )添加了一个
is_open
属性以指示基础文件句柄是否为_OPEN;已关闭的存储区现在将在查看存储区时报告“已关闭”(而不是引发错误) (GH4409 )结束了一场
HDFStore
现在将关闭HDFStore
但仅当引用计数时才关闭实际文件(按PyTables
)W.r.t.所有打开的控制柄都为0。实际上,您有一个本地实例HDFStore
由变量引用。一旦关闭,它将报告已关闭。其他引用(对同一文件)将继续运行,直到它们本身被关闭。对已关闭的文件执行操作将引发ClosedFileError
In [43]: path = 'test.h5' In [44]: df = pd.DataFrame(np.random.randn(10, 2)) In [45]: store1 = pd.HDFStore(path) --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version) 138 try: --> 139 module = importlib.import_module(name) 140 except ImportError: File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package) 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'tables' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [45], in <cell line: 1>() ----> 1 store1 = pd.HDFStore(path) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs) 570 if "format" in kwargs: 571 raise ValueError("format is not a defined argument for HDFStore") --> 573 tables = import_optional_dependency("tables") 575 if complib is not None and complib not in tables.filters.all_complibs: 576 raise ValueError( 577 f"complib only supports {tables.filters.all_complibs} compression." 578 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 140 except ImportError: 141 if errors == "raise": --> 142 raise ImportError(msg) 143 else: 144 return None ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables. In [46]: store2 = pd.HDFStore(path) --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:139, in import_optional_dependency(name, extra, errors, min_version) 138 try: --> 139 module = importlib.import_module(name) 140 except ImportError: File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package) 125 level += 1 --> 126 return _bootstrap._gcd_import(name[level:], package, level) File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level) File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_) File <frozen importlib._bootstrap>:1004, in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'tables' During handling of the above exception, another exception occurred: ImportError Traceback (most recent call last) Input In [46], in <cell line: 1>() ----> 1 store2 = pd.HDFStore(path) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/io/pytables.py:573, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs) 570 if "format" in kwargs: 571 raise ValueError("format is not a defined argument for HDFStore") --> 573 tables = import_optional_dependency("tables") 575 if complib is not None and complib not in tables.filters.all_complibs: 576 raise ValueError( 577 f"complib only supports {tables.filters.all_complibs} compression." 578 ) File /usr/local/lib/python3.10/dist-packages/pandas-1.5.0.dev0+697.gf9762d8f52-py3.10-linux-x86_64.egg/pandas/compat/_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 140 except ImportError: 141 if errors == "raise": --> 142 raise ImportError(msg) 143 else: 144 return None ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables. In [47]: store1.append('df', df) --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [47], in <cell line: 1>() ----> 1 store1.append('df', df) NameError: name 'store1' is not defined In [48]: store2.append('df2', df) --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [48], in <cell line: 1>() ----> 1 store2.append('df2', df) NameError: name 'store2' is not defined In [49]: store1 --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [49], in <cell line: 1>() ----> 1 store1 NameError: name 'store1' is not defined In [50]: store2 --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [50], in <cell line: 1>() ----> 1 store2 NameError: name 'store2' is not defined In [51]: store1.close() --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [51], in <cell line: 1>() ----> 1 store1.close() NameError: name 'store1' is not defined In [52]: store2 --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [52], in <cell line: 1>() ----> 1 store2 NameError: name 'store2' is not defined In [53]: store2.close() --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [53], in <cell line: 1>() ----> 1 store2.close() NameError: name 'store2' is not defined In [54]: store2 --------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [54], in <cell line: 1>() ----> 1 store2 NameError: name 'store2' is not defined
删除了
_quiet
属性,替换为DuplicateWarning
如果从表中检索重复行 (GH4367 )删除了
warn
论据来自open
。相反,一个PossibleDataLossError
如果您尝试使用mode='w'
使用打开的文件句柄 (GH4367 )add the keyword
dropna=True
toappend
to change whether ALL nan rows are not written to the store (default isTrue
, ALL nan rows are NOT written), also settable via the optionio.hdf.dropna_table
(GH4625)传递存储创建参数;可用于支持内存中存储
DataFrame恢复更改#
的HTML和纯文本表示形式 DataFrame
现在,在表超过特定大小时显示截断的表视图,而不是切换到Short INFO视图 (GH4886 , GH5550 )。随着小DataFrame变得更大,这使得表示更一致。

要获取信息视图,请调用 DataFrame.info()
。如果您更喜欢将信息视图作为大型DataFrame的REPR,则可以通过运行 set_option('display.large_repr', 'info')
。
增强#
df.to_clipboard()
学到了一个新的excel
关键字,用于将DF数据直接粘贴到EXCEL中(默认情况下启用)。 (GH5070 )。read_html
now raises aURLError
instead of catching and raising aValueError
(GH4303, GH4305)Added a test for
read_clipboard()
andto_clipboard()
(GH4282)剪贴板功能现在可以与PySide一起使用 (GH4282 )
当打印参数包含重叠的颜色和样式参数时,添加了信息量更大的错误消息 (GH4402 )
to_dict
现在需要records
作为一个可能的外向型。返回列键控词典的数组。 (GH4936 )NaN
handing in get_dummies (GH4446) withdummy_na
# previously, nan was erroneously counted as 2 here # now it is not counted at all In [55]: pd.get_dummies([1, 2, np.nan]) Out[55]: 1.0 2.0 0 1 0 1 0 1 2 0 0 # unless requested In [56]: pd.get_dummies([1, 2, np.nan], dummy_na=True) Out[56]: 1.0 2.0 NaN 0 1 0 0 1 0 1 0 2 0 0 1
timedelta64[ns]
运营部。看见 the docs 。警告
这些操作中的大多数需要
numpy >= 1.7
使用新的顶层
to_timedelta
,您可以从标准时间增量格式(由生成)转换标量或数组to_csv
)转换为时间增量类型 (np.timedelta64
在……里面nanoseconds
)。In [57]: pd.to_timedelta('1 days 06:05:01.00003') Out[57]: Timedelta('1 days 06:05:01.000030') In [58]: pd.to_timedelta('15.5us') Out[58]: Timedelta('0 days 00:00:00.000015500') In [59]: pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan']) Out[59]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None) In [60]: pd.to_timedelta(np.arange(5), unit='s') Out[60]: TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02', '0 days 00:00:03', '0 days 00:00:04'], dtype='timedelta64[ns]', freq=None) In [61]: pd.to_timedelta(np.arange(5), unit='d') Out[61]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
一系列数据类型
timedelta64[ns]
现在可以被另一个除以timedelta64[ns]
对象,或进行astype以生成float64
Dtype系列。这是一种频率转换。看见 the docs 为了医生。In [62]: import datetime In [63]: td = pd.Series(pd.date_range('20130101', periods=4)) - pd.Series( ....: pd.date_range('20121201', periods=4)) ....: In [64]: td[2] += np.timedelta64(datetime.timedelta(minutes=5, seconds=3)) In [65]: td[3] = np.nan In [66]: td Out[66]: 0 31 days 00:00:00 1 31 days 00:00:00 2 31 days 00:05:03 3 NaT dtype: timedelta64[ns] # to days In [67]: td / np.timedelta64(1, 'D') Out[67]: 0 31.000000 1 31.000000 2 31.003507 3 NaN dtype: float64 In [68]: td.astype('timedelta64[D]') Out[68]: 0 31.0 1 31.0 2 31.0 3 NaN dtype: float64 # to seconds In [69]: td / np.timedelta64(1, 's') Out[69]: 0 2678400.0 1 2678400.0 2 2678703.0 3 NaN dtype: float64 In [70]: td.astype('timedelta64[s]') Out[70]: 0 2678400.0 1 2678400.0 2 2678703.0 3 NaN dtype: float64
除以或乘以
timedelta64[ns]
按整数级数或整数级数In [71]: td * -1 Out[71]: 0 -31 days +00:00:00 1 -31 days +00:00:00 2 -32 days +23:54:57 3 NaT dtype: timedelta64[ns] In [72]: td * pd.Series([1, 2, 3, 4]) Out[72]: 0 31 days 00:00:00 1 62 days 00:00:00 2 93 days 00:15:09 3 NaT dtype: timedelta64[ns]
绝对的
DateOffset
对象可以等同于timedeltas
In [73]: from pandas import offsets In [74]: td + offsets.Minute(5) + offsets.Milli(5) Out[74]: 0 31 days 00:05:00.005000 1 31 days 00:05:00.005000 2 31 days 00:10:03.005000 3 NaT dtype: timedelta64[ns]
现在,时间增量支持Fillna
In [75]: td.fillna(pd.Timedelta(0)) Out[75]: 0 31 days 00:00:00 1 31 days 00:00:00 2 31 days 00:05:03 3 0 days 00:00:00 dtype: timedelta64[ns] In [76]: td.fillna(datetime.timedelta(days=1, seconds=5)) Out[76]: 0 31 days 00:00:00 1 31 days 00:00:00 2 31 days 00:05:03 3 1 days 00:00:05 dtype: timedelta64[ns]
您可以对时间增量执行数值归约操作。
In [77]: td.mean() Out[77]: Timedelta('31 days 00:01:41') In [78]: td.quantile(.1) Out[78]: Timedelta('31 days 00:00:00')
plot(kind='kde')
现在接受可选参数bw_method
和ind
分别传递给scipy.stats.gauskde()(对于scipy>=0.11.0)以设置带宽,并传递给gkde.valuate()以指定评估带宽的指数。请参见Scipy Docs。 (GH4298 )DataFrame构造函数现在接受数字掩码记录数组 (GH3478 )
一种新的矢量化弦方法
extract
更方便地返回正则表达式匹配。In [79]: pd.Series(['a1', 'b2', 'c3']).str.extract('[ab](\\d)') Out[79]: 0 0 1 1 2 2 NaN
不匹配的元素返回
NaN
。提取包含多个组的正则表达式将返回每个组包含一列的DataFrame。In [80]: pd.Series(['a1', 'b2', 'c3']).str.extract('([ab])(\\d)') Out[80]: 0 1 0 a 1 1 b 2 2 NaN NaN
不匹配的元素返回一行
NaN
。因此,一系列杂乱无章的字符串可以 已转换 转化为经过清理或更有用的字符串的类似索引的Series或DataFrame,而不需要get()
要访问元组或re.match
对象。命名组,如
In [81]: pd.Series(['a1', 'b2', 'c3']).str.extract( ....: '(?P<letter>[ab])(?P<digit>\\d)') ....: Out[81]: letter digit 0 a 1 1 b 2 2 NaN NaN
也可以使用可选的组。
In [82]: pd.Series(['a1', 'b2', '3']).str.extract( ....: '(?P<letter>[ab])?(?P<digit>\\d)') ....: Out[82]: letter digit 0 a 1 1 b 2 2 NaN 3
read_stata
现在接受Stata 13格式 (GH4291 )read_fwf
现在,如果数据使用提供给函数的分隔符正确分隔并正确对齐列,则会从文件的前100行推断列规范 (GH4488 )。支持纳秒时间作为偏移量
警告
这些操作需要
numpy >= 1.7
对秒及以下范围内的周期转换进行了修改,并将其扩展到纳秒。现在可以使用纳秒范围内的周期。
In [83]: pd.date_range('2013-01-01', periods=5, freq='5N') Out[83]: DatetimeIndex([ '2013-01-01 00:00:00', '2013-01-01 00:00:00.000000005', '2013-01-01 00:00:00.000000010', '2013-01-01 00:00:00.000000015', '2013-01-01 00:00:00.000000020'], dtype='datetime64[ns]', freq='5N')
或以频率作为偏移量
In [84]: pd.date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5)) Out[84]: DatetimeIndex([ '2013-01-01 00:00:00', '2013-01-01 00:00:00.000000005', '2013-01-01 00:00:00.000000010', '2013-01-01 00:00:00.000000015', '2013-01-01 00:00:00.000000020'], dtype='datetime64[ns]', freq='5N')
时间戳可以在纳秒范围内修改
In [85]: t = pd.Timestamp('20130101 09:01:02') In [86]: t + pd.tseries.offsets.Nano(123) Out[86]: Timestamp('2013-01-01 09:01:02.000000123')
一种新的方法,
isin
对于DataFrames,它与布尔索引配合得很好。的论据isin
,我们要将DataFrame与之进行比较,可以是DataFrame、Series、Dict或值数组。看见 the docs 想要更多。要获取满足任何条件的行,请执行以下操作:
In [87]: dfi = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']}) In [88]: dfi Out[88]: A B 0 1 a 1 2 b 2 3 f 3 4 n In [89]: other = pd.DataFrame({'A': [1, 3, 3, 7], 'B': ['e', 'f', 'f', 'e']}) In [90]: mask = dfi.isin(other) In [91]: mask Out[91]: A B 0 True False 1 False False 2 True True 3 False False In [92]: dfi[mask.any(axis=1)] Out[92]: A B 0 1 a 2 3 f
Series
现在支持to_frame
方法将其转换为单列DataFrame (GH5164 )现在可以将此处列出的所有R数据集加载到http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html对象中
# note that pandas.rpy was deprecated in v0.16.0 import pandas.rpy.common as com com.load_data('Titanic')
tz_localize
can infer a fall daylight savings transition based on the structure of the unlocalized data (GH4230), see the docsDatetimeIndex
is now in the API documentation, see the docsjson_normalize()
is a new method to allow you to create a flat table from semi-structured JSON data. See the docs (GH1067)添加了对qtanda as DataFrameModel和DataFrameWidget的PySide支持。
PythonCSV解析器现在支持usecol (GH4335 )
频率增加了几个新的偏移量:
DataFrame有了一个新的
interpolate
方法,类似于系列 (GH4434 , GH1892 )In [93]: df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8], ....: 'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]}) ....: In [94]: df.interpolate() Out[94]: A B 0 1.0 0.25 1 2.1 1.50 2 3.4 2.75 3 4.7 4.00 4 5.6 12.20 5 6.8 14.40
此外,
method
参数为interpolate
已扩展到包括'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', 'piecewise_polynomial', 'pchip', 'polynomial', 'spline'
新的方法需要 scipy. 请参考Scipy参考资料 guide 和 documentation 有关各种方法何时适用的详细信息,请参阅。看见 the docs 。Interpolate Now还接受
limit
关键字参数。其工作原理类似于fillna
的限制:In [95]: ser = pd.Series([1, 3, np.nan, np.nan, np.nan, 11]) In [96]: ser.interpolate(limit=2) Out[96]: 0 1.0 1 3.0 2 5.0 3 7.0 4 NaN 5 11.0 dtype: float64
已添加
wide_to_long
面板数据便捷功能。看见 the docs 。In [97]: np.random.seed(123) In [98]: df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"}, ....: "A1980" : {0 : "d", 1 : "e", 2 : "f"}, ....: "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7}, ....: "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1}, ....: "X" : dict(zip(range(3), np.random.randn(3))) ....: }) ....: In [99]: df["id"] = df.index In [100]: df Out[100]: A1970 A1980 B1970 B1980 X id 0 a d 2.5 3.2 -1.085631 0 1 b e 1.2 1.3 0.997345 1 2 c f 0.7 0.1 0.282978 2 In [101]: pd.wide_to_long(df, ["A", "B"], i="id", j="year") Out[101]: X A B id year 0 1970 -1.085631 a 2.5 1 1970 0.997345 b 1.2 2 1970 0.282978 c 0.7 0 1980 -1.085631 d 3.2 1 1980 0.997345 e 1.3 2 1980 0.282978 f 0.1
实验性的#
新的
eval()
函数使用以下命令实现表达式求值numexpr
在幕后。这为涉及大型DataFrame/Series的复杂表达式带来了很大的加速比。例如,In [102]: nrows, ncols = 20000, 100 In [103]: df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols)) .....: for _ in range(4)] .....:
# eval with NumExpr backend In [104]: %timeit pd.eval('df1 + df2 + df3 + df4') 7.63 ms +- 41.2 us per loop (mean +- std. dev. of 7 runs, 100 loops each)
# pure Python evaluation In [105]: %timeit df1 + df2 + df3 + df4 6.99 ms +- 49.7 us per loop (mean +- std. dev. of 7 runs, 100 loops each)
有关更多详细信息,请参阅 the docs
类似于
pandas.eval
,DataFrame
有一个新的DataFrame.eval
方法来计算DataFrame
。例如,In [106]: df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b']) In [107]: df.eval('a + b') Out[107]: 0 -0.685204 1 1.589745 2 0.325441 3 -1.784153 4 -0.432893 5 0.171850 6 1.895919 7 3.065587 8 -0.092759 9 1.391365 dtype: float64
query()
方法,该方法允许您选择DataFrame
使用与Python语法几乎相同的自然查询语法。例如,In [108]: n = 20 In [109]: df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c']) In [110]: df.query('a < b < c') Out[110]: a b c 11 1 5 8 15 8 16 19
选择的所有行
df
哪里a < b < c
计算结果为True
。有关更多详细信息,请参阅 the docs 。pd.read_msgpack()
andpd.to_msgpack()
are now a supported method of serialization of arbitrary pandas (and python objects) in a lightweight portable binary format. See the docs警告
由于这是一个试验库,因此存储格式可能要到将来的版本才能稳定。
df = pd.DataFrame(np.random.rand(5, 2), columns=list('AB')) df.to_msgpack('foo.msg') pd.read_msgpack('foo.msg') s = pd.Series(np.random.rand(5), index=pd.date_range('20130101', periods=5)) pd.to_msgpack('foo.msg', df, s) pd.read_msgpack('foo.msg')
你可以过去了
iterator=True
对解压缩的结果进行迭代for o in pd.read_msgpack('foo.msg', iterator=True): print(o)
pandas.io.gbq
provides a simple way to extract from, and load data into, Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high performance SQL-like database service, useful for performing ad-hoc queries against extremely large datasets. See the docsfrom pandas.io import gbq # A query to select the average monthly temperatures in the # in the year 2000 across the USA. The dataset, # publicata:samples.gsod, is available on all BigQuery accounts, # and is based on NOAA gsod data. query = """SELECT station_number as STATION, month as MONTH, AVG(mean_temp) as MEAN_TEMP FROM publicdata:samples.gsod WHERE YEAR = 2000 GROUP BY STATION, MONTH ORDER BY STATION, MONTH ASC""" # Fetch the result set for this query # Your Google BigQuery Project ID # To find this, see your dashboard: # https://console.developers.google.com/iam-admin/projects?authuser=0 projectid = 'xxxxxxxxx' df = gbq.read_gbq(query, project_id=projectid) # Use pandas to process and reshape the dataset df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP') df3 = pd.concat([df2.min(), df2.mean(), df2.max()], axis=1, keys=["Min Tem", "Mean Temp", "Max Temp"])
生成的DataFrame为::
> df3 Min Tem Mean Temp Max Temp MONTH 1 -53.336667 39.827892 89.770968 2 -49.837500 43.685219 93.437932 3 -77.926087 48.708355 96.099998 4 -82.892858 55.070087 97.317240 5 -92.378261 61.428117 102.042856 6 -77.703334 65.858888 102.900000 7 -87.821428 68.169663 106.510714 8 -89.431999 68.614215 105.500000 9 -86.611112 63.436935 107.142856 10 -78.209677 56.880838 92.103333 11 -50.125000 48.861228 94.996428 12 -50.332258 42.286879 94.396774
警告
要使用此模块,您需要一个BigQuery帐户。详情请参见<https://cloud.google.com/products/big-query>。
截至2013年10月10日,Google的API中存在一个错误,阻止结果集大于10万行。补丁计划在10/14/13这一周进行。
内部重构#
在0.13.0中,主要对子类进行了重大重构 Series
从… NDFrame
,它是当前 DataFrame
和 Panel
,统一方法和行为。以前直接从 ndarray
。 (GH4080 , GH3862 , GH816 )
警告
从<0.13.0开始有两个潜在的不兼容
使用某些NumPy函数以前会返回一个
Series
如果传递的是Series
作为一种争论。这似乎只会影响np.ones_like
,np.empty_like
,np.diff
和np.where
。这些现在还回来了ndarrays
。In [111]: s = pd.Series([1, 2, 3, 4])
块状的用法
In [112]: np.ones_like(s) Out[112]: array([1, 1, 1, 1]) In [113]: np.diff(s) Out[113]: array([1, 1, 1]) In [114]: np.where(s > 1, s, np.nan) Out[114]: array([nan, 2., 3., 4.])
Pandonic用法
In [115]: pd.Series(1, index=s.index) Out[115]: 0 1 1 1 2 1 3 1 dtype: int64 In [116]: s.diff() Out[116]: 0 NaN 1 1.0 2 1.0 3 1.0 dtype: float64 In [117]: s.where(s > 1) Out[117]: 0 NaN 1 2.0 2 3.0 3 4.0 dtype: float64
传递一个
Series
直接连接到一个Cython函数,该函数需要ndarray
类型将不再直接工作,您必须传递Series.values
,请参阅 Enhancing PerformanceSeries(0.5)
would previously return the scalar0.5
, instead this will return a 1-elementSeries
这一变化打破了
rpy2<=2.3.8
。已针对rpy2打开一个问题,并在中详细介绍了解决方法 GH5698 。感谢@JanSchulz。
对于0.13之前创建的泡菜,将保留泡菜兼容性。这些必须用不腌制的
pd.read_pickle
,请参见 Pickling 。重构Series.py/Frame.py/panel.py以将常见代码移动到Generic.py
已添加
_setup_axes
创建常规NDFrame结构已移动的方法
from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop
__iter__,keys,__contains__,__len__,__neg__,__invert__
convert_objects,as_blocks,as_matrix,values
__getstate__,__setstate__
(Compat保留在框架/面板中)__getattr__,__setattr__
_indexed_same,reindex_like,align,where,mask
fillna,replace
(Series
replace is now consistent withDataFrame
)filter
(还添加了轴参数,以选择性地在不同的轴上过滤)reindex,reindex_axis,take
truncate
(已移动以成为NDFrame
)
这些是API更改,它们使
Panel
更符合DataFrame
swapaxes
在一个Panel
使用指定的相同轴,现在返回一个副本支持设置属性访问
Filter支持与原始版本相同的接口
DataFrame
过滤
在没有参数的情况下调用reindex现在将返回输入对象的副本
TimeSeries
现在是的别名Series
。该物业is_time_series
可用于区分(如果需要)重构稀疏对象以使用BlockManager
在内饰中创建了新的积木类型,
SparseBlock
,它可以容纳多个数据类型,并且是不可合并的。SparseSeries
和SparseDataFrame
现在从这些层次结构(Series/DataFrame)继承更多方法,而不再从SparseArray
(相反,它是SparseBlock
)稀疏套件现在支持与非稀疏数据集成。支持非浮点型稀疏数据(部分实现)
在DataFrames中对稀疏结构的操作应该保持稀疏性,合并类型的操作将转换为密集(然后再转换为稀疏),因此效率可能会有点低
启用设置
SparseSeries
用于布尔/整型/切片SparsePanels
实施保持不变(例如,不使用BlockManager,需要工作)
已添加
ftypes
方法设置为Series/DataFrame,类似于dtypes
,但指示基础是否为稀疏/密集(以及数据类型)全
NDFrame
对象现在可以使用__finalize__()
要指定要从现有对象传播到新对象的各种值(例如name
在……里面Series
现在将更自动地跟踪)内部类型检查现在通过一套生成的类来完成,从而允许
isinstance(value, klass)
无需直接导入Klass,由@jtratner提供系列更新中的错误,其中父框架没有根据更改更新其缓存 (GH4080 )或类型 (GH3217 )、Fill na (GH3386 )
重构
Series.reindex
至core/Generic.py (GH4604 , GH4618 ),允许method=
在对系列进行重新索引以进行工作时Series.copy
不再接受order
参数,并且现在与NDFrame
复制重构
rename
方法到core/Generic.py;修复Series.rename
为 (GH4605 ),并添加了rename
使用相同的签名Panel
重构
clip
到core/Generic.py的方法 (GH4798 )重构
_get_numeric_data/_get_bool_data
到core/Generic.py,允许系列/面板功能Series
(用于索引)/Panel
(对于项)现在允许对其元素进行属性访问 (GH1903 )In [118]: s = pd.Series([1, 2, 3], index=list('abc')) In [119]: s.b Out[119]: 2 In [120]: s.a = 5 In [121]: s Out[121]: a 5 b 2 c 3 dtype: int64
错误修复#
HDFStore
养育一名伤残人士
TypeError
而不是ValueError
当追加不同的块顺序时 (GH4096 )read_hdf
was not respecting as passedmode
(GH4504)追加0-len表将正常工作 (GH4273 )
to_hdf
was raising when passing both argumentsappend
andtable
(GH4584)从具有跨dtype的重复列的存储区中读取将引发 (GH4767 )
修复了以下位置的错误
ValueError
当列名不是字符串时,未正确引发 (GH4956 )以固定格式编写的零长度序列未正确反序列化。 (GH4708 )
修复了pyt3上的解码性能问题 (GH5441 )
在存储之前验证多索引中的级别 (GH5527 )
正确处理
data_columns
带面板 (GH5717 )
Fixed bug in tslib.tz_convert(vals, tz1, tz2): it could raise IndexError exception while trying to access trans[pos + 1] (GH4496)
修复了中的错误
PeriodIndex.map
在哪里使用str
将返回索引的字符串表示形式 (GH4136 )修复了测试失败问题
test_time_series_plot_color_with_empty_kwargs
使用自定义matplotlib默认颜色时 (GH4345 )修复Stata IO测试的运行。现在使用临时文件写入 (GH4353 )
修复了以下问题
DataFrame.sum
比这更慢DataFrame.mean
对于整数值帧 (GH4365 )read_html
测试现在可与Python2.6配合使用 (GH4351 )修复了以下错误
network
测试是在抛出NameError
因为未定义局部变量 (GH4381 )在……里面
to_json
,如果通过,则举起orient
会因为重复的索引而导致数据丢失 (GH4359 )在……里面
to_json
,修复日期处理,因此毫秒是文档字符串所说的默认时间戳 (GH4362 )。JSON NaT handling fixed, NaTs are now serialized to
null
(GH4498)修复了JSON对象键中可转义字符的JSON处理 (GH4593 )
Fixed passing
keep_default_na=False
whenna_values=None
(GH4318)修复了的错误
values
在具有重复列和混合数据类型的DataFrame上引发错误,出现在 (GH4377 )Fixed bug with duplicate columns and type conversion in
read_json
whenorient='split'
(GH4377)修复了区域设置中使用小数分隔符而不是‘.’的JSON错误在编码/解码某些值时引发异常。 (GH4918 )
Fix
.iat
indexing with aPeriodIndex
(GH4390)修复了以下问题
PeriodIndex
与SELF连接将返回一个新实例,而不是相同的实例 (GH4379 );还为其他索引类型添加了此测试修复了在使用带有usecols参数的CSV cparser时所有数据类型都被转换为对象的错误 (GH3192 )
修复合并块时出现的问题,其中生成的DataFrame具有部分set_ref_Locs (GH4403 )
修复了使用顶级matplotlib API调用hist子图时会覆盖它们的问题 (GH4408 )
修复了字节作为元组重新生成时的py3兼容问题 (GH4455 )
修正了项目命名为‘a’时的面板属性命名冲突 (GH3440 )
修复了打印时引发重复索引的问题 (GH4486 )
固定面板切片发布于
xs
返回了一个不正确的暗显对象 (GH4016 )带转置框架的固定面板分配 (GH3830 )
在使用Panel和Panel作为需要对齐的值的集合索引时引发 (GH3777 )
修复了对具有多个数据类型的重复多索引进行排序时出现的问题 (GH4516 )
修复了中的错误
DataFrame.set_values
这会导致在扩展索引时丢失名称属性。 (GH3742 , GH4039 )修复了以下问题:
names
,levels
和labels
可以被设置为MultiIndex
未经验证 (GH3714 , GH4039 )固定的 (GH3334 )。如果值是索引,则不计算边距。
修复了RHS为
np.timedelta64
或np.offsets.DateOffset
使用DateTime操作时 (GH4532 )使用序列/日期时间索引和
np.timedelta64
工作方式不同 (GH4134 )和NumPy 1.6中的错误时间增量 (GH4135 )修复错误
pd.read_clipboard
在带有PY3的Windows上 (GH4561 );没有正确解码tslib.get_period_field()
和tslib.get_period_field_arr()
现在,将If代码参数提高到超出范围 (GH4519 , GH4520 )修复空序列上的布尔索引丢失索引名 (GH4235 ),INFER_dtype适用于空数组。
修复了多个轴的重建索引;如果轴匹配没有替换当前轴,则可能会导致延迟频率推断问题 (GH3317 )
修复了以下情况下的问题
DataFrame.apply
错误地重新定位异常(导致原始堆栈跟踪被截断)。修复所选内容
ix/loc
和非唯一选择符(_U) (GH4619 )修复涉及现有列中数据类型更改的iloc/loc赋值 (GH4312 , GH5702 )在核心/索引中有内部setitem_with_indexer以使用Block.setitem
修复了CSV_IMPORT中未正确处理浮点数的千位运算符的错误 (GH4322 )
修复了许多DateOffset未正确使用CacheableOffset的问题;这阻止了DateOffset被缓存 (GH4609 )
修复与LHS上的DataFrame和RHS上的列表/元组的布尔比较 (GH4576 )
Fix error/dtype conversion with setitem of
None
onSeries/DataFrame
(GH4667)Fix decoding based on a passed in non-default encoding in
pd.read_stata
(GH4626)修复
DataFrame.from_records
加一种普通的香草ndarray
。 (GH4727 )修复某些不一致之处
Index.rename
和MultiIndex.rename
等。 (GH4718 , GH4628 )在使用中出现错误
iloc/loc
具有横截面和重复索引 (GH4726 )使用时出现错误
QUOTE_NONE
使用to_csv
引起Exception
。 (GH4328 )当右侧的长度不正确时,序列索引不会引发错误的错误 (GH2702 )
将部分字符串选择作为多索引的一部分进行多索引时出现错误 (GH4758 )
Bug with reindexing on the index with a non-unique index will now raise
ValueError
(GH4746)设置时出现错误
loc/ix
具有多索引轴和NumPy数组的单个索引器,与 (GH3777 )窃听
iloc
当切片索引失败时 (GH4771 )不正确的错误消息,没有柱面或宽度
read_fwf
。 (GH4774 )修复了使用以下命令读取压缩文件的错误
read_fwf
在Python3中。 (GH3963 )通过更改数据类型修复了重复索引和赋值的问题 (GH4686 )
修复了在AS中读取压缩文件的错误
bytes
而不是str
在Python3中。简化了在Python3中产生字节的文件处理 (GH3963 , GH4785 )。修复了在不同版本的matplotlib中使用对数比例条形图修复了与ticklocs/tickLabels相关的问题 (GH4789 )
与REPR()发出的内部调用相关联的已取消警告 (GH4391 )
Fixed an issue with a duplicate index and duplicate selector with
.loc
(GH4825)Fixed an issue with
DataFrame.sort_index
where, when sorting by a single column and passing a list forascending
, the argument forascending
was being interpreted asTrue
(GH4839, GH4846)Fixed
Panel.tshift
not working. Addedfreq
support toPanel.shift
(GH4853)修复带有数千个!=“,”的TextFileReader中的一个问题。 (GH4596 )
使用WHERE时具有重复索引的getitem中出现错误 (GH4879 )
FIX类型推断代码将浮点列强制为DATETIME (GH4601 )
固定的
_ensure_numeric
不检查复数 (GH4902 )修复了中的错误
Series.hist
其中两个人物正在创建时,by
参数已传递 (GH4112 , GH4113 )。修复了中的错误
convert_objects
对于>2 ndims (GH4937 )Fixed string methods for
FrozenNDArray
andFrozenList
(GH4929)修复了在索引放大方案中设置无效或超出范围的值的错误 (GH4940 )
空系列上的FILLNA测试 (GH4346 ),谢谢@imerrr
修复了Read_CSV的Python解析器中的skipprows选项 (GH4382 )
修复了错误预防
cut
从使用np.inf
不显式传递标签的级别 (GH3415 )Fixed wrong check for overlapping in
DatetimeIndex.union
(GH4564)修复了csv_parser中千位分隔符和日期解析器之间的冲突 (GH4678 )
修复数据类型不同时的追加(显示混合浮点数/np.DateTime64时出错) (GH4993 )
修复DateOffset的REPR。不再以kwd显示重复条目。删除了未使用的偏移量字段。 (GH4638 )
修复了使用usecol时READ_CSV期间的错误索引名称。仅适用于c语言解析器。 (GH4201 )
Timestamp
对象现在可以出现在比较操作的左侧,Series
或DataFrame
对象 (GH4982 )。Fix a bug when indexing with
np.nan
viailoc/loc
(GH5016)修复了内存不足的c解析器可能在同一文件的不同块中创建不同类型的错误。现在强制为数字类型或发出警告。 (GH3866 )
Fix a bug where reshaping a
Series
to its own shape raisedTypeError
(GH4554) and other reshaping issues.设置时出现错误
ix/loc
和混合整型/字符串索引 (GH4544 )确保系列-系列布尔比较是基于标签的 (GH4947 )
使用时间戳部分索引器进行多级索引时出现错误 (GH4294 )
测试/修复全NAN框架的多索引构造 (GH4078 )
修复了以下位置的错误
read_html()
未正确推断带有逗号的表的值 (GH5029 )修复了以下位置的错误
read_html()
没有提供返回表的稳定排序 (GH4770 , GH5029 )。Fixed a bug where
read_html()
was incorrectly parsing when passedindex_col=0
(GH5066).修复了以下位置的错误
read_html()
错误地推断标头的类型 (GH5048 )。修复了以下位置的错误
DatetimeIndex
与PeriodIndex
导致堆栈溢出 (GH3899 )。修复了以下位置的错误
groupby
对象不允许打印 (GH5102 )。修复了以下位置的错误
groupby
对象不是制表符完成的列名 (GH5102 )。修复了以下位置的错误
groupby.plot()
朋友们多次重复数字 (GH5102 )。提供自动转换
object
填充NA上的数据类型,相关 (GH5103 )修复了在选项解析器清理过程中覆盖默认选项的错误 (GH5121 )。
对某一名单[n]一视同仁
iloc
使用类似列表的索引 (GH5006 )修复
MultiIndex.get_level_values()
带有缺失的值 (GH5074 )修复了对具有DateTime64输入的Timestamp()的边界检查 (GH4065 )
在以下位置修复错误
TestReadHtml
没有拨打正确的read_html()
功能 (GH5150 )。用来修复错误
NDFrame.replace()
这使得替换看起来好像(错误地)使用了正则表达式 (GH5143 )。修复了TO_DATETIME更好的错误消息 (GH4928 )
确保在Travis-ci上测试不同的区域设置 (GH4918 )。还添加了几个实用程序,用于使用上下文管理器获取区域设置和设置区域设置。
Compound dtypes in a constructor raise
NotImplementedError
(GH5191)比较重复帧时出现错误 (GH4421 )相关
描述重复帧时出现错误
窃听
to_datetime
具有一种格式和coerce=True
不是募捐 (GH5195 )窃听
loc
需要播出的系列剧的多个索引器和RHS设置 (GH5206 )修复了级别或标签的就地设置错误
MultiIndex
不会清除缓存values
属性,因此返回错误values
。 (GH5215 )修复了过滤分组的DataFrame或系列不保持原始顺序的错误 (GH4621 )。
固定的
Period
业务日期频率,如果是在非业务日期,则始终前滚。 (GH5203 )修复了Excel编写器中的错误,其中具有重复列名的框架未正确写入。 (GH5235 )
修复了的问题
drop
和级数上的一个非唯一索引 (GH5248 )修复了C解析器中因传递的名称多于文件中的列而导致的段错误。 (GH5156 )
修复
Series.isin
具有类似日期/时间的数据类型 (GH5021 )C和Python解析器现在可以处理更常见的多索引列格式,该格式没有索引名称行 (GH4702 )
尝试将越界日期用作对象数据类型时出现错误 (GH5312 )
尝试显示嵌入的PandasObject时出现错误 (GH5324 )
如果结果与越界相关,则允许操作时间戳以返回DateTime (GH5312 )
Fix return value/type signature of
initObjToJSON()
to be compatible with numpy'simport_array()
(GH5334, GH5326)在DataFrame上重命名然后设置_index时出现错误 (GH5344 )
测试套件在测试图形时不再留下临时文件。 (GH5347 )(感谢您捕捉到这位@yarikopo!)
修复了Win32上的html测试。 (GH4580 )
确保
head/tail
是iloc
基于, (GH5370 )修复了的错误
PeriodIndex
字符串表示(如果有1个或2个元素)。 (GH5372 )GroupBy方法
transform
和filter
可用于具有重复(非唯一)索引的Series和DataFrame。 (GH4620 )修复了Repr中不打印名称的空系列 (GH4651 )
默认情况下,使测试在临时目录中创建临时文件。 (GH5419 )
pd.to_timedelta
返回一个标量。 (GH5410 )pd.to_timedelta
接受NaN
和NaT
,返回NaT
与其提高 (GH5437 )在以下方面的性能改进
isnull
关于较大尺寸的Pandas物体修正了与索引器没有匹配长度的1D ndarray的各种类型 (GH5508 )
Bug in getitem with a MultiIndex and
iloc
(GH5528)系列上的数据项中存在错误 (GH5542 )
使用自定义函数且对象未发生突变时应用中的错误修复 (GH5545 )
Bug in selecting from a non-unique index with
loc
(GH5553)当用户函数返回一个
None
, (GH5592 )Work around regression in numpy 1.7.0 which erroneously raises IndexError from
ndarray.item
(GH5666)具有结果非唯一索引的对象的重复索引错误 (GH5678 )
使用系列和通过的系列/词典填充NA中的错误 (GH5703 )
使用类似日期时间的分组程序进行GROUPBY转换时出现错误 (GH5712 )
使用某些键时,PY3中的多索引选择出现错误 (GH5725 )
不同数据类型的行级合并在某些情况下失败 (GH5754 )
贡献者#
共有77人为此次发布贡献了补丁。名字中带有“+”的人第一次贡献了一个补丁。
Agustín Herranz +
Alex Gaudio +
Alex Rothberg +
Andreas Klostermann +
Andreas Würl +
Andy Hayden
Ben Alex +
Benedikt Sauer +
Brad Buran
Caleb Epstein +
Chang She
Christopher Whelan
DSM +
Dale Jung +
Dan Birken
David Rasch +
Dieter Vandenbussche
Gabi Davar +
Garrett Drapala
Goyo +
Greg Reda +
Ivan Smirnov +
Jack Kelly +
Jacob Schaer +
Jan Schulz +
Jeff Tratner
Jeffrey Tratner
John McNamara +
John W. O'Brien +
Joris Van den Bossche
Justin Bozonier +
Kelsey Jordahl
Kevin Stone
Kieran O'Mahony
Kyle Hausmann +
Kyle Kelley +
Kyle Meyer
Mike Kelly
Mortada Mehyar +
Nick Foti +
Olivier Harris +
Ondřej Čertík +
PKEuS
Phillip Cloud
Pierre Haessig +
Richard T. Guy +
Roman Pekar +
Roy Hyunjin Han
Skipper Seabold
Sten +
Thomas A Caswell +
Thomas Kluyver
Tiago Requeijo +
TomAugspurger
Trent Hauck
Valentin Haenel +
Viktor Kerkez +
Vincent Arel-Bundock
Wes McKinney
Wes Turner +
Weston Renoud +
Yaroslav Halchenko
Zach Dwiel +
chapman siu +
chappers +
d10genes +
danielballan
daydreamt +
engstrom +
jreback
monicaBee +
prossahl +
rockg +
unutbu +
westurner +
y-p
zach powers