KEGG

KEGG（https：//www.kegg.jp/）是一个数据库资源，用于从分子水平信息，特别是基因组测序和其他高通量实验技术生成的大规模分子数据集，了解细胞、生物体和生态系统等生物系统的高级功能和实用性。

请注意，Biopython中的KEGG解析器实现不完整。虽然KEGG网站指出了许多平面文件格式，但目前仅实现了化合物、酶和地图的解析器和写入器。然而，实现了通用解析器来处理其他格式。

解析KEGG记录

解析KEGG记录就像使用Biopython中的任何其他文件格式解析器一样简单。（在运行以下代码之前，请用网络浏览器打开http://rest.kegg.jp/get/ec:5.4.2.2并将其保存为 ec_5.4.2.2.txt .)

>>> from Bio.KEGG import Enzyme
>>> records = Enzyme.parse(open("ec_5.4.2.2.txt"))
>>> record = list(records)[0]
>>> record.classname
['Isomerases;', 'Intramolecular transferases;', 'Phosphotransferases (phosphomutases)']
>>> record.entry
'5.4.2.2'

或者，如果输入的KEGG文件只有一个条目，则可以使用 read :

>>> from Bio.KEGG import Enzyme
>>> record = Enzyme.read(open("ec_5.4.2.2.txt"))
>>> record.classname
['Isomerases;', 'Intramolecular transferases;', 'Phosphotransferases (phosphomutases)']
>>> record.entry
'5.4.2.2'

以下部分将展示如何使用KEGG api下载上述酶，以及如何使用通用解析器处理没有实现自定义解析器的数据。

查询KEGG API

Biopython完全支持KEGG api的查询。支持查询所有KEGG端点;支持KEGG记录的所有方法（https：//www.kegg.jp/kegg/rest/keggapi.html）。该界面对遵循KEGG网站上定义的规则的查询进行了一些验证。但是，返回400或404的无效查询必须由用户处理。

首先，以下是如何通过下载相关酶并将其传递给Enzene解析器来扩展上述示例。

>>> from Bio.KEGG import REST
>>> from Bio.KEGG import Enzyme
>>> request = REST.kegg_get("ec:5.4.2.2")
>>> open("ec_5.4.2.2.txt", "w").write(request.read())
>>> records = Enzyme.parse(open("ec_5.4.2.2.txt"))
>>> record = list(records)[0]
>>> record.classname
['Isomerases;', 'Intramolecular transferases;', 'Phosphotransferases (phosphomutases)']
>>> record.entry
'5.4.2.2'

现在，这里有一个更现实的示例，展示了查询KEGG API的组合。这将演示如何提取与DNA修复相关的一组独特的所有人类途径基因符号。为此需要采取的步骤如下。首先，我们需要获得所有人类途径的列表。其次，我们需要过滤那些与“修复”相关的内容。最后，我们需要获得所有修复途径中所有基因符号的列表。

from Bio.KEGG import REST

human_pathways = REST.kegg_list("pathway", "hsa").read()

# Filter all human pathways for repair pathways
repair_pathways = []
for line in human_pathways.rstrip().split("\n"):
    entry, description = line.split("\t")
    if "repair" in description:
        repair_pathways.append(entry)

# Get the genes for pathways and add them to a list
repair_genes = []
for pathway in repair_pathways:
    pathway_file = REST.kegg_get(pathway).read()  # query and read each pathway

    # iterate through each KEGG pathway file, keeping track of which section
    # of the file we're in, only read the gene in each pathway
    current_section = None
    for line in pathway_file.rstrip().split("\n"):
        section = line[:12].strip()  # section names are within 12 columns
        if not section == "":
            current_section = section

        if current_section == "GENE":
            gene_identifiers, gene_description = line[12:].split("; ")
            gene_id, gene_symbol = gene_identifiers.split()

            if not gene_symbol in repair_genes:
                repair_genes.append(gene_symbol)

print(
    "There are %d repair pathways and %d repair genes. The genes are:"
    % (len(repair_pathways), len(repair_genes))
)
print(", ".join(repair_genes))

KEGG API包装器与所有端点兼容。用法本质上是用逗号替换url中的所有斜线，并使用该列表作为KEGG模块中相应方法的参数。以下是api文档中的一些示例（https：//www.kegg.jp/kegg/docs/keggapi.html）。

/list/hsa:10458+ece:Z5100            -> REST.kegg_list(["hsa:10458", "ece:Z5100"])
/find/compound/300-310/mol_weight    -> REST.kegg_find("compound", "300-310", "mol_weight")
/get/hsa:10458+ece:Z5100/aaseq      -> REST.kegg_get(["hsa:10458", "ece:Z5100"], "aaseq")