>>> from env_helper import info; info()
页面更新时间: 2020-03-08 18:00:05
操作系统/OS: Linux-4.19.0-8-amd64-x86_64-with-debian-10.3 ;Python: 3.7.3
7.7. Pandas注意事项&窍门¶
警告和疑难意味着一个看不见的问题。在使用Pandas过程中,需要特别注意的地方。 与Pandas一起使用If/Truth语句
当尝试将某些东西转换成布尔值时,Pandas遵循了一个错误的惯例。 这种情况发生在使用布尔运算的。 目前还不清楚结果是什么。 如果它是真的,因为它不是zerolength? 错误,因为有错误的值? 目前还不清楚,Pandas提出了一个ValueError -
import pandas as pd
if pd.Series([False, True, False]):
print ('I am True')
执行上面示例代码,得到以下结果 -
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
在if条件,它不清楚如何处理它。错误提示是否使用None或任何这些。
>>> import pandas as pd
>>> if pd.Series([False, True, False]).any():
>>> print("I am any")
I am any
要在布尔上下文中评估单元素Pandas对象,请使用方法.bool() -
>>> import pandas as pd
>>> print (pd.Series([True]).bool())
True
按位布尔值¶
按位布尔运算符(如==和!=)将返回一个布尔系列,这几乎总是需要的。
>>> import pandas as pd
>>>
>>> s = pd.Series(range(5))
>>> print (s==4)
0 False
1 False
2 False
3 False
4 True
dtype: bool
isin操作符¶
这将返回一个布尔序列,显示系列中的每个元素是否完全包含在传递的值序列中。
>>> import pandas as pd
>>>
>>> s = pd.Series(list('abc'))
>>> s = s.isin(['a', 'c', 'e'])
>>> print (s)
0 True
1 False
2 True
dtype: bool
重构索引与ix陷阱¶
许多用户会发现自己使用ix索引功能作为从Pandas对象中选择数据的简洁方法 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three',
>>> 'four'],index=list('abcdef'))
>>>
>>> print (df)
>>> print ("=============================================")
>>> print (df.ix[['b', 'c', 'e']])
one two three four
a 0.761727 0.601259 0.691793 -0.430211
b -0.517379 -2.118008 0.466482 -0.168260
c -0.338952 -0.893442 0.020344 -0.814290
d -0.122582 0.428400 0.725127 -1.669588
e -1.045259 -0.822986 -0.803290 -0.937524
f -0.160144 1.229767 1.189295 -1.762160
=============================================
one two three four
b -0.517379 -2.118008 0.466482 -0.168260
c -0.338952 -0.893442 0.020344 -0.814290
e -1.045259 -0.822986 -0.803290 -0.937524
/usr/lib/python3/dist-packages/ipykernel_launcher.py:9: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
if __name__ == '__main__':
这当然在这种情况下完全等同于使用reindex方法 -
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three',
>>> 'four'],index=list('abcdef'))
>>> print (df)
>>> print("=============================================")
>>> print (df.reindex(['b', 'c', 'e']))
one two three four
a -0.382707 0.490284 -0.757426 -1.072039
b 1.032573 0.446418 0.062161 -0.579593
c 0.330762 0.190814 -0.378788 1.097861
d 0.381762 -1.019551 -0.918567 -1.626412
e -0.970455 0.073398 -2.527512 1.617695
f 0.067697 -0.735713 1.327123 -0.123758
=============================================
one two three four
b 1.032573 0.446418 0.062161 -0.579593
c 0.330762 0.190814 -0.378788 1.097861
e -0.970455 0.073398 -2.527512 1.617695
有人可能会得出这样的结论,ix和reindex是基于这个100%的等价物。 除了整数索引的情况,它是true。例如,上述操作可选地表示为 -
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three',
>>> 'four'],index=list('abcdef'))
>>>
>>> print (df)
>>> print("=====================================")
>>> print (df.ix[[1, 2, 4]])
>>> print("=====================================")
>>> print (df.reindex([1, 2, 4]))
one two three four
a 1.173411 2.031607 0.410409 0.731330
b -2.052936 0.229990 -0.475675 -0.709150
c 1.457306 -2.360150 -1.346016 0.495090
d 0.456730 -1.044787 0.827997 -1.972323
e 0.477892 -0.089661 0.055344 -0.073556
f 1.615016 -1.268165 -0.888479 -0.050170
=====================================
one two three four
b -2.052936 0.229990 -0.475675 -0.709150
c 1.457306 -2.360150 -1.346016 0.495090
e 0.477892 -0.089661 0.055344 -0.073556
=====================================
one two three four
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
4 NaN NaN NaN NaN
/usr/lib/python3/dist-packages/ipykernel_launcher.py:9: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
if __name__ == '__main__':
重要的是要记住,reindex只是严格的标签索引。这可能会导致一些潜在的令人惊讶的结果,例如索引包含整数和字符串的病态情况。