pandas.read_json#

pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer', nrows=None, storage_options=None)[源代码]#

将JSON字符串转换为Pandas对象。

参数

path_or_buf有效JSON字符串、路径对象或类似文件的对象

任何有效的字符串路径都可以接受。该字符串可以是URL。有效的URL方案包括http、ftp、s3和文件。对于文件URL，需要主机。本地文件可以是： file://localhost/path/to/table.json 。

如果要传入Path对象，则Pandas接受任何 os.PathLike 。

对于类似文件的对象，我们使用 read() 方法，如文件句柄(例如，通过内置 open 函数)或 StringIO 。

orient应力

预期JSON字符串格式的指示。可以通过以下方式生成兼容的JSON字符串 to_json() 并具有相应的方向值。可能的方向集包括：

'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
'records' : list like [{column -> value}, ... , {column -> value}]
'index' : dict like {index -> {column -> value}}
'columns' : dict like {column -> {index -> value}}
'values' ：仅值数组

允许的值和默认值取决于 typ 参数。

什么时候 typ == 'series' ，
- 允许的方向为 {{'split','records','index'}}
- 缺省值为 'index'
- 对于Orient，系列索引必须是唯一的 'index' 。
什么时候 typ == 'frame' ，
- 允许的方向为 {{'split','records','index', 'columns','values', 'table'}}
- 缺省值为 'columns'
- DataFrame索引对于方向必须是唯一的 'index' 和 'columns' 。
- DataFrame列对于方向必须是唯一的 'index' ， 'columns' ，以及 'records' 。

typ{‘Frame’，‘Series’}，默认‘Frame’

要恢复的对象的类型。

dtypeBool或dict，默认为无

如果为True，则推断数据类型；如果将列判定为Dtype，则使用这些数据类型；如果为False，则根本不推断数据类型，仅适用于数据。

为所有人 orient 值，除 'table' ，默认为True。

在 0.25.0 版更改: 不适用于 orient='table' 。

convert_axes布尔默认为无

尝试将轴转换为适当的数据类型。

为所有人 orient 值，除 'table' ，默认为True。

在 0.25.0 版更改: 不适用于 orient='table' 。

convert_datesBool或字符串列表，默认为True

如果为True，则可以转换默认的类似日期的列(取决于KEEP_DEFAULT_DATES)。如果为False，则不会转换任何日期。如果是列名列表，那么这些列将被转换，默认的类似日期的列也可能被转换(取决于KEEP_DEFAULT_DATES)。

keep_default_dates布尔值，默认为True

如果解析日期(CONVERT_DATES不是FALSE)，则尝试解析默认的类似日期的列。列标签在以下情况下与日期类似

它的结尾是 '_at' ，
它的结尾是 '_time' ，
它的开头是 'timestamp' ，
它是 'modified' ，或
它是 'date' 。

numpy布尔值，默认为False

直接解码到NumPy数组。仅支持数字数据，但支持非数字列和索引标签。另请注意，如果NumPy=True，则每个术语的JSON顺序必须相同。

1.0.0 版后已移除.

precise_float布尔值，默认为False

设置为允许在将字符串解码为双精度值时使用更高精度(Strtod)函数。默认(FALSE)是使用快速但不太精确的内置功能。

date_unit字符串，默认为无

检测是否转换日期的时间戳单位。默认行为是尝试并检测正确的精度，但如果这不是所需的，则传递‘s’、‘ms’、‘us’或‘ns’之一，以分别强制仅秒、毫秒、微秒或纳秒进行解析。

encoding字符串，默认为‘utf-8’

用于解码py3字节的编码。

encoding_errors字符串，可选，默认为“Strong”

如何处理编码错误。 List of possible values 。

1.3.0 新版功能.

lines布尔值，默认为False

将文件作为每行的json对象读取。

chunksize整型，可选

返回迭代的JsonReader对象。请参阅 line-delimited json docs 有关以下内容的更多信息 chunksize 。只有在以下情况下才能通过 lines=True 。如果设置为NONE，则文件将一次全部读取到内存中。

在 1.2 版更改: JsonReader 是一名上下文管理器。

compression字符串或词典，默认为‘INFER’

For on-the-fly decompression of on-disk data. If 'infer' and 'path_or_buf' is path-like, then detect compression from the following extensions: '.gz', '.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no decompression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, or zstandard.ZstdDecompressor, respectively. As an example, the following could be passed for Zstandard decompression using a custom compression dictionary: compression={'method': 'zstd', 'dict_data': my_compression_dict}.

在 1.4.0 版更改: Z标准支持。

nrows整型，可选

必须从行分隔符的json文件中读取的行数。只有在以下情况下才能通过 lines=True 。如果为NONE，则将返回所有行。

1.1 新版功能.

storage_optionsDICT，可选

对特定存储连接有意义的额外选项，例如主机、端口、用户名、密码等。对于HTTP(S)URL，键-值对被转发到 urllib.request.Request 作为标题选项。对于其他URL(例如，以“s3：//”和“gcs：//”开头)，键-值对被转发到 fsspec.open 。请看 fsspec 和 urllib 有关更多详细信息和有关存储选项的更多示例，请参阅 here 。

1.2.0 新版功能.

退货

系列或DataFrame: 返回的类型取决于 typ 。

参见

DataFrame.to_json: 将DataFrame转换为JSON字符串。
Series.to_json: 将Series转换为JSON字符串。
json_normalize: 将半结构化的JSON数据标准化为平面表。

注意事项

特定于 orient='table' ，如果是 DataFrame 带有字面意思的 Index 姓名或名称 index 用来编写 to_json() ，则后续的读取操作将错误地设置 Index 名称为 None 。这是因为 index 也可用于 DataFrame.to_json() 表示丢失的 Index 名称，以及后续的 read_json() 操作无法区分这两者。同样的限制也适用于 MultiIndex 以及任何以以下开头的名称 'level_' 。

示例

>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
...                   index=['row 1', 'row 2'],
...                   columns=['col 1', 'col 2'])

使用对数据帧进行编码/解码 'split' 格式化的JSON：

>>> df.to_json(orient='split')
    '{"columns":["col 1","col 2"],"index":["row 1","row 2"],"data":[["a","b"],["c","d"]]}'
>>> pd.read_json(_, orient='split')
      col 1 col 2
row 1     a     b
row 2     c     d

使用对数据帧进行编码/解码 'index' 格式化的JSON：

>>> df.to_json(orient='index')
'{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'

>>> pd.read_json(_, orient='index')
      col 1 col 2
row 1     a     b
row 2     c     d

使用对数据帧进行编码/解码 'records' 格式化的JSON。请注意，此编码不会保留索引标签。

>>> df.to_json(orient='records')
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
>>> pd.read_json(_, orient='records')
  col 1 col 2
0     a     b
1     c     d

使用表架构进行编码

>>> df.to_json(orient='table')
    '{"schema":{"fields":[{"name":"index","type":"string"},{"name":"col 1","type":"string"},{"name":"col 2","type":"string"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":"row 1","col 1":"a","col 2":"b"},{"index":"row 2","col 1":"c","col 2":"d"}]}'

pandas.ExcelWriter.write_cells

pandas.json_normalize