读写文件¶

本页介绍常见的应用程序；有关I/O例程的完整集合，请参阅输入和输出 .

阅读文本和 CSV 文件夹¶

没有缺失值¶

使用 numpy.loadtxt .

缺少值¶

使用 numpy.genfromtxt .

numpy.genfromtxt 你愿意吗

return a masked array masking out missing values (if usemask=True), or

填写缺少的值 具有中指定的值 filling_values （默认为 np.nan 对于float，-1表示int）。

使用非空格分隔符¶

>>> print(open("csv.txt").read())  
1, 2, 3
4,, 6
7, 8, 9

屏蔽阵列输出¶

>>> np.genfromtxt("csv.txt", delimiter=",", usemask=True)  
masked_array(
  data=[[1.0, 2.0, 3.0],
        [4.0, --, 6.0],
        [7.0, 8.0, 9.0]],
  mask=[[False, False, False],
        [False,  True, False],
        [False, False, False]],
  fill_value=1e+20)

阵列输出¶

>>> np.genfromtxt("csv.txt", delimiter=",")  
array([[ 1.,  2.,  3.],
       [ 4., nan,  6.],
       [ 7.,  8.,  9.]])

数组输出，指定的填充值¶

>>> np.genfromtxt("csv.txt", delimiter=",", dtype=np.int8, filling_values=99)  
array([[ 1,  2,  3],
       [ 4, 99,  6],
       [ 7,  8,  9]], dtype=int8)

空格分隔¶

numpy.genfromtxt 如果需要，还可以分析缺少值的以空格分隔的数据文件

每个字段都有固定的宽度 ：使用宽度作为 delimiter 争论。:

# File with width=4. The data does not have to be justified (for example,
# the 2 in row 1), the last column can be less than width (for example, the 6
# in row 2), and no delimiting character is required (for instance 8888 and 9
# in row 3)

>>> f = open("fixedwidth.txt").read()  # doctest: +SKIP
>>> print(f)  # doctest: +SKIP
1   2      3
44      6
7   88889

# Showing spaces as ^
>>> print(f.replace(" ","^"))  # doctest: +SKIP
1^^^2^^^^^^3
44^^^^^^6
7^^^88889

>>> np.genfromtxt("fixedwidth.txt", delimiter=4)  # doctest: +SKIP
array([[1.000e+00, 2.000e+00, 3.000e+00],
       [4.400e+01,       nan, 6.000e+00],
       [7.000e+00, 8.888e+03, 9.000e+00]])

特殊值（如“x”）表示缺少字段 ：将其用作 missing_values 争论。:

>>> print(open("nan.txt").read())  
1 2 3
44 x 6
7  8888 9

>>> np.genfromtxt("nan.txt", missing_values="x")  
array([[1.000e+00, 2.000e+00, 3.000e+00],
       [4.400e+01,       nan, 6.000e+00],
       [7.000e+00, 8.888e+03, 9.000e+00]])

要跳过缺少值的行吗 ：设置 invalid_raise=False . ：：

>>> print(open("skip.txt").read())  
1 2   3
44    6
7 888 9

>>> np.genfromtxt("skip.txt", invalid_raise=False)  
__main__:1: ConversionWarning: Some errors were detected !
    Line #2 (got 2 columns instead of 3)
array([[  1.,   2.,   3.],
       [  7., 888.,   9.]])

分隔符空格字符与表示缺少数据的空格不同 . 例如，如果列由 \t ，则如果缺少的数据由一个或多个空格组成，则会识别它。:

>>> f = open("tabs.txt").read()  
>>> print(f)  
1       2       3
44              6
7       888     9

# Tabs vs. spaces
>>> print(f.replace("\t","^"))  
1^2^3
44^ ^6
7^888^9

>>> np.genfromtxt("tabs.txt", delimiter="\t", missing_values=" +")  
array([[  1.,   2.,   3.],
       [ 44.,  nan,   6.],
       [  7., 888.,   9.]])

读取.npy或.npz格式的文件¶

选择：

使用 numpy.load . 它可以读取任何 numpy.save ， numpy.savez 或 numpy.savez_compressed .

使用内存映射。看到了吗 numpy.lib.format.open_memmap .

写入NumPy要读回的文件¶

二元的¶

使用 numpy.save ，或存储多个数组 numpy.savez 或 numpy.savez_compressed .

为了 security and portability ，集合 allow_pickle=False 除非dtype包含Python对象，这需要pickle。

屏蔽阵列 can't currently be saved ，其他任意数组子类也不能。

可读的¶

numpy.save 和 numpy.savez 创建二进制文件。至 write a human-readable file 使用 numpy.savetxt . 数组只能是一维或二维的，没有 ` savetxtz` 对于多个文件。

大型阵列¶

见写入或读取大型数组 .

读取任意格式的二进制文件（“二进制blob”）¶

使用A structured array .

例子：

这个 .wav 文件头是一个44字节的块 data_size 实际声音数据的字节数：

chunk_id         "RIFF"
chunk_size       4-byte unsigned little-endian integer
format           "WAVE"
fmt_id           "fmt "
fmt_size         4-byte unsigned little-endian integer
audio_fmt        2-byte unsigned little-endian integer
num_channels     2-byte unsigned little-endian integer
sample_rate      4-byte unsigned little-endian integer
byte_rate        4-byte unsigned little-endian integer
block_align      2-byte unsigned little-endian integer
bits_per_sample  2-byte unsigned little-endian integer
data_id          "data"
data_size        4-byte unsigned little-endian integer

这个 .wav 文件头作为NumPy结构化数据类型：：

wav_header_dtype = np.dtype([
    ("chunk_id", (bytes, 4)), # flexible-sized scalar type, item size 4
    ("chunk_size", "<u4"),    # little-endian unsigned 32-bit integer
    ("format", "S4"),         # 4-byte string, alternate spelling of (bytes, 4)
    ("fmt_id", "S4"),
    ("fmt_size", "<u4"),
    ("audio_fmt", "<u2"),     #
    ("num_channels", "<u2"),  # .. more of the same ...
    ("sample_rate", "<u4"),   #
    ("byte_rate", "<u4"),
    ("block_align", "<u2"),
    ("bits_per_sample", "<u2"),
    ("data_id", "S4"),
    ("data_size", "<u4"),
    #
    # the sound data itself cannot be represented here:
    # it does not have a fixed size
])

header = np.fromfile(f, dtype=wave_header_dtype, count=1)[0]

这个 .wav 举例说明；阅读 .wav 在现实生活中使用Python的内置模块 wave .

（改编自Pauli Virtanen， Advanced NumPy ，根据 CC BY 4.0 ）

写入或读取大型数组¶

数组太大，无法放入内存 可以像使用内存映射的普通内存数组一样处理。

使用写入的原始数组数据 numpy.ndarray.tofile 或 numpy.ndarray.tobytes 可以阅读 numpy.memmap ：：
```
array = numpy.memmap("mydata/myarray.arr", mode="r", dtype=np.int16, shape=(1024, 1024))
```
文件输出方式 numpy.save （即，使用numpy格式）可以使用 numpy.load 与 mmap_mode 关键字参数：
```
large_array[some_slice] = np.load("path/to/small_array", mmap_mode="r")
```

内存映射缺少数据分块和压缩等功能；NumPy提供的功能更全面的格式和库包括：

HDF5 ： h5py 或 PyTables .
Zarr ： here .
NETCDF ： scipy.io.netcdf_file .

有关memmap、Zarr和HDF5之间的权衡，请参见 pythonspeed.com .

编写文件以供其他（非NumPy）工具读取¶

格式 交换数据 其他工具包括HDF5、Zarr和NetCDF（参见写入或读取大型数组）

写入或读取JSON文件¶

NumPy数组 not 直接地 JSON serializable .

使用pickle文件保存/还原¶

尽可能避免； pickles 对于错误或恶意构造的数据不安全。

使用 numpy.save 和 numpy.load . 集合 allow_pickle=False ，除非数组数据类型包含Python对象，在这种情况下需要pickling。

从数据帧转换为NumPy数组¶

见 pandas.DataFrame.to_numpy .

保存/还原使用 `tofile` 和 `fromfile`¶

一般来说，你更喜欢 numpy.save 和 numpy.load .

numpy.ndarray.tofile 和 numpy.fromfile 丢失有关持久性和精确性的信息，因此不适用于除刮擦以外的任何存储。

如何写一个NumPy操作指南解释