pandas.read_stata#

pandas.read_stata(filepath_or_buffer, convert_dates=True, convert_categoricals=True, index_col=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, chunksize=None, iterator=False, compression='infer', storage_options=None)[源代码]#

将Stata文件读入DataFrame。

参数

filepath_or_buffer字符串、路径对象或类文件对象

任何有效的字符串路径都可以接受。该字符串可以是URL。有效的URL方案包括http、ftp、s3和文件。对于文件URL，需要主机。本地文件可以是： file://localhost/path/to/table.dta 。

如果要传入Path对象，则Pandas接受任何 os.PathLike 。

对于类似文件的对象，我们使用 read() 方法，如文件句柄(例如，通过内置 open 函数)或 StringIO 。

convert_dates布尔值，默认为True

将日期变量转换为DataFrame时间值。

convert_categoricals布尔值，默认为True

读取值标签并将列转换为分类/因素变量。

index_col字符串，可选

要设置为索引的列。

convert_missing布尔值，默认为False

指示是否将缺少的值转换为其Stata表示形式的标志。如果为False，则缺少的值将替换为NaN。如果为True，则包含缺少值的列将与对象数据类型一起返回，并且缺少值由StataMissingValue对象表示。

preserve_dtypes布尔值，默认为True

保留Stata数据类型。如果为False，则将数字数据向上转换为外部数据的PANDA默认类型(Float64或int64)。

columns列表或无

要保留的列。列将按给定的顺序返回。NONE返回所有列。

order_categoricals布尔值，默认为True

指示转换的分类数据是否已排序的标志。

chunksizeInt，默认为无

对于迭代返回StataReader对象，返回给定行数的块。

iterator布尔值，默认为False

返回StataReader对象。

compression字符串或词典，默认为‘INFER’

For on-the-fly decompression of on-disk data. If 'infer' and '%s' is path-like, then detect compression from the following extensions: '.gz', '.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no decompression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, or zstandard.ZstdDecompressor, respectively. As an example, the following could be passed for Zstandard decompression using a custom compression dictionary: compression={'method': 'zstd', 'dict_data': my_compression_dict}.

storage_optionsDICT，可选

对特定存储连接有意义的额外选项，例如主机、端口、用户名、密码等。对于HTTP(S)URL，键-值对被转发到 urllib.request.Request 作为标题选项。对于其他URL(例如，以“s3：//”和“gcs：//”开头)，键-值对被转发到 fsspec.open 。请看 fsspec 和 urllib 有关更多详细信息和有关存储选项的更多示例，请参阅 here 。

退货

DataFrame或StataReader

参见

io.stata.StataReader: Stata数据文件的低级读取器。
DataFrame.to_stata: 导出STATA数据文件。

注意事项

通过迭代器读取的类别变量可能具有不同的类别和数据类型。当存储在DTA文件中的变量与一组不完整的值标签相关联时，就会发生这种情况，这些值标签仅标签值的严格子集。

示例

为本例创建虚拟Stata>df=pd.DataFrame({‘Animal’： [‘Falcon’，‘鹦鹉’，‘Falcon’...“鹦鹉”] ..《速度》： [350、18、361、15] })#doctest：+跳过>df.to_Stata(‘Animals.dta’)#doctest：+跳过

阅读Stata DTA文件：

>>> df = pd.read_stata('animals.dta')  

读取10,000行块中的Stata DTA文件：>Values=np随机.randint(0，10，Size=(20_000，1)，dtype=“uint8”)#doctest：+Skip>df=pd.DataFrame(Values，Columns= ["i"] )#doctest：+跳过>df.to_Stata(‘filename.dta’)#doctest：+跳过

>>> itr = pd.read_stata('filename.dta', chunksize=10000)  
>>> for chunk in itr:
...    # Operate on a single chunk, e.g., chunk.mean()
...    pass  

pandas.read_gbq

pandas.DataFrame.to_stata