>>> from env_helper import info; info()

页面更新时间： 2024-04-07 23:44:29
运行环境：
    Linux发行版本: Debian GNU/Linux 12 (bookworm)
    操作系统内核: Linux-6.1.0-18-amd64-x86_64-with-glibc2.36
    Python版本: 3.11.2

6.4. 实例：用HTML作为排版工具¶

6.4.1. 场景描述¶

有一组图片，每张图片有一段描述文字（可放到 XLSX 中），以及长宽等属性，将图片及相关信息放到一起生成文档，以方便打印。

解决思路：相关信息生成 HTML ，后面可用 Word 排版、输出。

6.4.2. 解决方法¶

用Jinja2加载模板文件(html_base.html)：

>>> from jinja2 import FileSystemLoader,Environment
>>> env = Environment(loader=FileSystemLoader('templates'))
>>> template = env.get_template('html_base.html')

openpyxl模块是一个读写Excel文档的Python库

创建Workbook对象：用openpyxl模块的load_workbook函数进行读取excel文档

>>> import openpyxl
>>> wb = openpyxl.load_workbook('img_contents.xlsx')

获取当前活跃的Worksheet

>>> sheet = wb.active

对sheet进行遍历获取相应的内容，并赋值给变量html_str

>>> html_str=''
>>> for i in range(2, sheet.max_row + 1):
>>>     img_title = sheet.cell(row=i, column=1).value
>>>     img_misc = sheet.cell(row=i, column=2).value
>>>     img_length = sheet.cell(row=i, column=3).value
>>>     img_width = sheet.cell(row=i, column=4).value
>>>
>>>     if img_title:
>>>         pass
>>>     else:
>>>         continue
>>>
>>>     html_content = '<div class="thumb"><img src="imgs/' + img_title + '"'+ ' width=' + str(img_width) + ' height=' + str(img_length) + '/> ' + img_misc + ' </div>'
>>>     html_str += html_content

对模板进行渲染，将变量html_str赋值给html_base.html中的变量content

>>> template.render(content=html_str)

'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"n        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">n<html xmlns="http://www.w3.org/1999/xhtml">n<head>n    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>n    <title> </title>n</head>n<body>n<div class="thumb"><img src="imgs/sphinx_1.jpg" width=150 height=150/> 图1描述 </div><div class="thumb"><img src="imgs/sphinx_2.jpg" width=151 height=151/> 图2描述 </div><div class="thumb"><img src="imgs/sphinx_3.jpg" width=152 height=152/> 图3描述 </div><div class="thumb"><img src="imgs/Tornado_1.jpg" width=153 height=153/> 图4描述 </div><div class="thumb"><img src="imgs/Tornado_2.jpg" width=154 height=154/> 图5描述 </div><div class="thumb"><img src="imgs/Tornado_3.jpg" width=155 height=155/> 图6描述 </div><div class="thumb"><img src="imgs/Tornado_4.jpg" width=156 height=156/> 图7描述 </div>n</body>n</html>'

将图文信息导出到word

思路：解析’http://drr.ikcest.org/’ 页面，将图片保存下来，再将图片添加到world文档里，最后将图片删除。

（若将从excel中读取内容生成的html页面的图文保存到word中，只需要把以下代码url，解析规则做相应的修改即可。）

这里使用的是python-docx库：

python-docx包可以用来创建docx文档，并对现有文档进行更改，包含段落、分页符、表格、图片、标题、样式等

python-docx安装

pip install python-docx

导入相应库：

>>> import requests
>>> from bs4 import BeautifulSoup
>>> import os
>>> import docx
>>> from docx import Document
>>> from docx.shared import Inches

解析页面

>>> url = 'http://drr.ikcest.org/'
>>> html = requests.get(url).content
>>> soup = BeautifulSoup(html,'html.parser')
>>> imgs_table = soup.find('table',{"class":"table"})
>>> img=str(imgs_table.find('div',{"class":"col-sm-4"})).split('src="')[1].split('"')[0]
>>> img_src='http://drr.ikcest.org'+img
>>> img_title=imgs_table.find('div',{"class":"col-sm-8"}).text
>>> img

'/static/upload/82/827587c0-2fda-11eb-8efe-00163e0618d6_m.jpg'

保存图片至本地

>>> img_name = 'xx_drr_img.jpg'
>>> with open(img_name,'wb')as f:
>>>     response = requests.get(img_src).content
>>>     f.write(response)
>>>     f.close()

创建document对象，并向文档中添加文字，图片

>>> document = Document()
>>> document.add_paragraph(img_title)
>>> document.add_picture(img_name)

<docx.shape.InlineShape at 0x7f56201d6490>

保存文档

>>> document.save('xx_tuwen.doc')

删除保存在本地的图片

>>> os.remove(img_name)

详细排版可了解python-doc进行进一步操作

6.3. Jinja2模板引擎的用法

6.5. Python之markdown模块

Python 3 教程 文档

6.4. 实例：用HTML作为排版工具¶

6.4.1. 场景描述¶

6.4.2. 解决方法¶

Python 3 教程文档