Bio.UniGene包
模块内容
解析Unigene平面文件格式文件,例如Hs.data文件。
以下是此解析器处理的平面文件格式的概述:
线路类型/限定符::
ID UniGene cluster ID TITLE Title for the cluster GENE Gene symbol CYTOBAND Cytological band EXPRESS Tissues of origin for ESTs in cluster RESTR_EXPR Single tissue or development stage contributes more than half the total EST frequency for this gene. GNM_TERMINUS genomic confirmation of presence of a 3' terminus; T if a non-templated polyA tail is found among a cluster's sequences; else I if templated As are found in genomic sequence or S if a canonical polyA signal is found on the genomic sequence GENE_ID Entrez gene identifier associated with at least one sequence in this cluster; to be used instead of LocusLink. LOCUSLINK LocusLink identifier associated with at least one sequence in this cluster; deprecated in favor of GENE_ID HOMOL Homology; CHROMOSOME Chromosome. For plants, CHROMOSOME refers to mapping on the arabidopsis genome. STS STS ACC= GenBank/EMBL/DDBJ accession number of STS [optional field] UNISTS= identifier in NCBI's UNISTS database TXMAP Transcript map interval MARKER= Marker found on at least one sequence in this cluster RHPANEL= Radiation Hybrid panel used to place marker PROTSIM Protein Similarity data for the sequence with highest-scoring protein similarity in this cluster ORG= Organism PROTGI= Sequence GI of protein PROTID= Sequence ID of protein PCT= Percent alignment ALN= length of aligned region (aa) SCOUNT Number of sequences in the cluster SEQUENCE Sequence ACC= GenBank/EMBL/DDBJ accession number of sequence NID= Unique nucleotide sequence identifier (gi) PID= Unique protein sequence identifier (used for non-ESTs) CLONE= Clone identifier (used for ESTs only) END= End (5'/3') of clone insert read (used for ESTs only) LID= Library ID; see Hs.lib.info for library name and tissue MGC= 5' CDS-completeness indicator; if present, the clone associated with this sequence is believed CDS-complete. A value greater than 511 is the gi of the CDS-complete mRNA matched by the EST, otherwise the value is an indicator of the reliability of the test indicating CDS completeness; higher values indicate more reliable CDS-completeness predictions. SEQTYPE= Description of the nucleotide sequence. Possible values are mRNA, EST and HTC. TRACE= The Trace ID of the EST sequence, as provided by NCBI Trace Archive
- class Bio.UniGene.SequenceLine(text=None)
基类:
object
存储Unigene文件中一条SEQUENCE行的信息。
用SEQUENCE行的文本部分初始化,或者什么都不初始化。
- 属性和描述(以大小写形式访问):
ACC= 基因bank/MBE/DDBJ序列登录号
NID= 唯一核苷酸序列标识符(gi)
PID= 唯一的蛋白质序列标识符(用于非EST)
克隆= 克隆标识符(仅用于EST)
END= 克隆插入读取结束(5 '/3 ')(仅用于EST)
盖子= 图书馆ID;有关图书馆名称和纸巾,请参阅Hs.lib.info
MGC= 5'CDS-完整性指示物;如果存在,则认为与该序列相关的克隆是CDS-完整的。大于511的值是与EST匹配的CDS完整mRNA的gi,否则该值是指示CDS完整性的测试可靠性的指标;较高的值指示更可靠的CDS完整性预测。
序列号= 核苷酸序列的描述。可能的值是mRNA、EST和HTC。
TRACE= EST序列的Trace ID,由NCBI Trace Archive提供
- __init__(text=None)
初始化课程。
- __repr__()
将UniGene SequenceLine对象作为字符串返回。
- __firstlineno__ = 78
- __static_attributes__ = ('acc', 'clone', 'end', 'image', 'is_image', 'lid', 'mgc', 'nid', 'pid', 'seqtype', 'text', 'trace')
- class Bio.UniGene.ProtsimLine(text=None)
基类:
object
存储Unigene文件中一条PROTSIM行的信息。
使用PROTSIM行的文本部分初始化,或者什么都不初始化。
属性和描述(以大小写形式访问)ORG= 有机物PROTGI= 蛋白质PRODID的序列GI = 蛋白质PCT序列ID = 对齐百分比ALN= 对齐区域的长度(aa)
- __init__(text=None)
初始化课程。
- __repr__()
将UniGene ProtsimLine对象作为字符串返回。
- __firstlineno__ = 137
- __static_attributes__ = ('aln', 'org', 'pct', 'protgi', 'protid', 'text')
- class Bio.UniGene.STSLine(text=None)
基类:
object
存储Unigene文件中一条STS行的信息。
使用STS行的文本部分初始化,或者什么都不初始化。
属性和描述(以大小写形式访问)
ACC= STS的SEN/MBE/DDBJ登录号 [optional field] 第一次= NCBI UNISTS数据库中的标识符
- __init__(text=None)
初始化课程。
- __repr__()
以字符串形式返回UniGene STSLine对象。
- __firstlineno__ = 173
- __static_attributes__ = ('acc', 'text', 'unists')
- class Bio.UniGene.Record
基类:
object
存储Unigene记录。
以下是存储的内容::
self.ID = '' # ID line self.species = '' # Hs, Bt, etc. self.title = '' # TITLE line self.symbol = '' # GENE line self.cytoband = '' # CYTOBAND line self.express = [] # EXPRESS line, parsed on ';' # Will be an array of strings self.restr_expr = '' # RESTR_EXPR line self.gnm_terminus = '' # GNM_TERMINUS line self.gene_id = '' # GENE_ID line self.locuslink = '' # LOCUSLINK line self.homol = '' # HOMOL line self.chromosome = '' # CHROMOSOME line self.protsim = [] # PROTSIM entries, array of Protsims # Type ProtsimLine self.sequence = [] # SEQUENCE entries, array of Sequence entries # Type SequenceLine self.sts = [] # STS entries, array of STS entries # Type STSLine self.txmap = [] # TXMAP entries, array of TXMap entries
- __init__()
初始化课程。
- __repr__()
将UniGene Record对象表示为字符串以进行调试。
- __firstlineno__ = 204
- __static_attributes__ = ('ID', 'chromosome', 'cytoband', 'express', 'gene_id', 'gnm_terminus', 'homol', 'locuslink', 'protsim', 'restr_expr', 'sequence', 'species', 'sts', 'symbol', 'title', 'txmap')
- Bio.UniGene.parse(handle)
对于包含多个记录的文件,读取并加载UniGene记录。
- Bio.UniGene.read(handle)
读取并加载UniGene记录,每个文件一条记录。