>>> from env_helper import info; info()
页面更新时间: 2024-01-23 21:51:17
    Linux发行版本: Debian GNU/Linux 12 (bookworm)
    操作系统内核: Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
    Python版本: 3.11.2

2.3. 使用python-docx 创建文档

相比于读取 DOCX 文档,写入的 API 更加丰富,功能更加全面,但是更重要的还是要灵活运用。创建文档内容时有下面的方法:

  • 增加标题:add_heading(text=u'', level=1)

  • 添加分页符:add_page_break()

  • 添加表:add_table(rows, cols, style=None)

  • 添加图片:add_picture(image_path_or_stream, width=None, height=None)

  • 添加段落:add_paragraph(text=u'', style=None)、add_section(start_type = 2

  • 保存:save(path_or_stream

相较于使用 Word 软件进行编辑、排版得到的 DOCX 文档, 通过 API 更容易生成“良构”的 DOCX 文档。 这里的“良构”,主要是按标题、段落等结构,并不考虑文档的样式。


>>> from docx import Document
>>> file = Document()
>>> file.add_heading("demo1",0)
<docx.text.paragraph.Paragraph at 0x7fef692d25d0>


>>> core_properties = file.core_properties
>>> core_properties.author = 'Li Ming'
>>> file.core_properties.author
'Li Ming'

2.3.1. 添加段落

add_paragraph() 添加段落到文档末尾, 该段落填充了文本并且具有段落样式style。 文本可以包含制表符( \t ),这些字符将转换为制表符的相应XML形式。 文本还可以包含换行符( \n )或回车符( \r ),每个字符都转换为换行符。

段落是 DOCX 文档的基本构成单元,它们用于正文文本:

>>> p=file.add_paragraph("这是添加的第一个段落。\n")

此方法返回对段落的引用,新添加的段落在文档的结尾。新的段落引用被分配给 paragraph , 但是除非需要,否则在以下示例中将省略该引用。在代码中,添加项目后通常不会对它做任何事情,因此保留对它的引用并没有多大意义。


>>> a_p=p.insert_paragraph_before("在上个段落前在插入一个段落。")


2.3.2. 添加标题


标题段落将包含文本,并且其段落样式由级别确定。如果level为0,则样式设置为 Title 。如果级别为1(或省略),则使用标题1。否则,样式设置为 Heading {level} 。如果级别超出0-9范围,则会引发 ValueError 异常。

add_heading() 标题段落新添加到文档末尾的。

>>> file.add_heading("这是一个默认标题")
>>> file.add_heading("这是一个二级标题",level=2)
<docx.text.paragraph.Paragraph at 0x7fef69339fd0>

2.3.3. 添加分页符

想要下一个段落前,添加一个单独的页面,可以使用这个方法。 add_page_break() 返回Paragraph仅包含分页符的新对象。

>>> file.add_page_break()
<docx.text.paragraph.Paragraph at 0x7fef49031790>

2.3.4. 添加表格

add_table(rows, cols, style=None) 添加具有行和列的表, style 代表表格样式风格。样式可以是段落样式对象或段落样式名称。 如果 style 为None,则表继承文档的默认表样式。

>>> tb=file.add_table(rows=3,cols=3)


>>> cell = tb.cell(0, 1)


>>> cell.text = '这是表格0,1'


>>> row = tb.rows[1]
>>> row.cells[0].text = '1,0'
>>> row.cells[1].text = '1,1'
>>> row.cells[2].text = '1,2'


>>> row = tb.add_row()

使用上面添加表格行和列的方案,这可能非常方便。 先构建要添加的内容:

>>> records = (
>>>     (3, '101', 'Spam'),
>>>     (7, '422', 'Eggs'),
>>>     (4, '631', 'Spam, spam, eggs, and spam')
>>> )


>>> table = file.add_table(rows=1, cols=3)
>>> hdr_cells = table.rows[0].cells
>>> hdr_cells[0].text = 'Qty'
>>> hdr_cells[1].text = 'Id'
>>> hdr_cells[2].text = 'Desc'
>>> for qty, id, desc in records:
>>>     row_cells = table.add_row().cells
>>>     row_cells[0].text = str(qty)
>>>     row_cells[1].text = id
>>>     row_cells[2].text = desc

2.3.5. 添加图片

图片会根据指定的宽度和高度缩放。如果未指定宽度或高度,则图片以其原始尺寸显示。如果仅指定一个,它将用于计算比例因子,然后将比例因子应用于未指定的尺寸,从而保留图像的纵横比。图片的原始尺寸是使用图像文件中指定的每英寸点数(dpi)值计算的,如果没有指定值,则默认为72 dpi。

add_picture() 添加的新图片形状在自己的段落中。

>>> file.add_picture('./img/Tulips.jpg')

FileNotFoundError                         Traceback (most recent call last)

Cell In [14], line 1
----> 1 file.add_picture('./img/Tulips.jpg')

File /usr/lib/python3/dist-packages/docx/document.py:72, in Document.add_picture(self, image_path_or_stream, width, height)
     59 """
     60 Return a new picture shape added in its own paragraph at the end of
     61 the document. The picture contains the image at
     69 is often the case.
     70 """
     71 run = self.add_paragraph().add_run()
---> 72 return run.add_picture(image_path_or_stream, width, height)

File /usr/lib/python3/dist-packages/docx/text/run.py:62, in Run.add_picture(self, image_path_or_stream, width, height)
     49 def add_picture(self, image_path_or_stream, width=None, height=None):
     50     """
     51     Return an |InlineShape| instance containing the image identified by
     52     *image_path_or_stream*, added to the end of this run.
     60     value is specified, as is often the case.
     61     """
---> 62     inline = self.part.new_pic_inline(image_path_or_stream, width, height)
     63     self._r.add_drawing(inline)
     64     return InlineShape(inline)

File /usr/lib/python3/dist-packages/docx/parts/story.py:56, in BaseStoryPart.new_pic_inline(self, image_descriptor, width, height)
     50 def new_pic_inline(self, image_descriptor, width, height):
     51     """Return a newly-created `w:inline` element.
     53     The element contains the image specified by *image_descriptor* and is scaled
     54     based on the values of *width* and *height*.
     55     """
---> 56     rId, image = self.get_or_add_image(image_descriptor)
     57     cx, cy = image.scaled_dimensions(width, height)
     58     shape_id, filename = self.next_id, image.filename

File /usr/lib/python3/dist-packages/docx/parts/story.py:29, in BaseStoryPart.get_or_add_image(self, image_descriptor)
     21 def get_or_add_image(self, image_descriptor):
     22     """Return (rId, image) pair for image identified by *image_descriptor*.
     24     *rId* is the str key (often like "rId7") for the relationship between this story
     27     such as dimensions and image type.
     28     """
---> 29     image_part = self._package.get_or_add_image_part(image_descriptor)
     30     rId = self.relate_to(image_part, RT.IMAGE)
     31     return rId, image_part.image

File /usr/lib/python3/dist-packages/docx/package.py:31, in Package.get_or_add_image_part(self, image_descriptor)
     25 def get_or_add_image_part(self, image_descriptor):
     26     """Return |ImagePart| containing image specified by *image_descriptor*.
     28     The image-part is newly created if a matching one is not already present in the
     29     collection.
     30     """
---> 31     return self.image_parts.get_or_add_image_part(image_descriptor)

File /usr/lib/python3/dist-packages/docx/package.py:74, in ImageParts.get_or_add_image_part(self, image_descriptor)
     68 def get_or_add_image_part(self, image_descriptor):
     69     """Return |ImagePart| object containing image identified by *image_descriptor*.
     71     The image-part is newly created if a matching one is not present in the
     72     collection.
     73     """
---> 74     image = Image.from_file(image_descriptor)
     75     matching_image_part = self._get_by_sha1(image.sha1)
     76     if matching_image_part is not None:

File /usr/lib/python3/dist-packages/docx/image/image.py:46, in Image.from_file(cls, image_descriptor)
     44 if is_string(image_descriptor):
     45     path = image_descriptor
---> 46     with open(path, 'rb') as f:
     47         blob = f.read()
     48         stream = BytesIO(blob)

FileNotFoundError: [Errno 2] No such file or directory: './img/Tulips.jpg'


>>> from docx.shared import Inches
>>> file.add_picture('./img/Tulips.jpg', width=Inches(1.0))

该Inches和Cm被提供的类,让你指定使用的单位测量。在内部,python-docx使用英寸。也可以使用像这样的方式width = Inches(3) / thing_count

2.3.6. 保存文档

保存文档使用 save() 方法,参数为要保存的文件路径。

>>> file.save('xx_write.docx')