Bio.UniGene包

模块内容

解析Unigene平面文件格式文件，例如Hs.data文件。

以下是此解析器处理的平面文件格式的概述：

线路类型/限定符：：

ID           UniGene cluster ID
TITLE        Title for the cluster
GENE         Gene symbol
CYTOBAND     Cytological band
EXPRESS      Tissues of origin for ESTs in cluster
RESTR_EXPR   Single tissue or development stage contributes
             more than half the total EST frequency for this gene.
GNM_TERMINUS genomic confirmation of presence of a 3' terminus;
             T if a non-templated polyA tail is found among
             a cluster's sequences; else
             I if templated As are found in genomic sequence or
             S if a canonical polyA signal is found on
               the genomic sequence
GENE_ID      Entrez gene identifier associated with at least one
             sequence in this cluster;
             to be used instead of LocusLink.
LOCUSLINK    LocusLink identifier associated with at least one
             sequence in this cluster;
             deprecated in favor of GENE_ID
HOMOL        Homology;
CHROMOSOME   Chromosome.  For plants, CHROMOSOME refers to mapping
             on the arabidopsis genome.
STS          STS
     ACC=         GenBank/EMBL/DDBJ accession number of STS
                  [optional field]
     UNISTS=      identifier in NCBI's UNISTS database
TXMAP        Transcript map interval
     MARKER=      Marker found on at least one sequence in this
                  cluster
     RHPANEL=     Radiation Hybrid panel used to place marker
PROTSIM      Protein Similarity data for the sequence with
             highest-scoring protein similarity in this cluster
     ORG=         Organism
     PROTGI=      Sequence GI of protein
     PROTID=      Sequence ID of protein
     PCT=         Percent alignment
     ALN=         length of aligned region (aa)
SCOUNT       Number of sequences in the cluster
SEQUENCE     Sequence
     ACC=         GenBank/EMBL/DDBJ accession number of sequence
     NID=         Unique nucleotide sequence identifier (gi)
     PID=         Unique protein sequence identifier (used for
                  non-ESTs)
     CLONE=       Clone identifier (used for ESTs only)
     END=         End (5'/3') of clone insert read (used for
                  ESTs only)
     LID=         Library ID; see Hs.lib.info for library name
                  and tissue
     MGC=         5' CDS-completeness indicator; if present, the
                  clone associated with this sequence is believed
                  CDS-complete. A value greater than 511 is the gi
                  of the CDS-complete mRNA matched by the EST,
                  otherwise the value is an indicator of the
                  reliability of the test indicating CDS
                  completeness; higher values indicate more
                  reliable CDS-completeness predictions.
    SEQTYPE=      Description of the nucleotide sequence.
                  Possible values are mRNA, EST and HTC.
    TRACE=        The Trace ID of the EST sequence, as provided by
                  NCBI Trace Archive

class Bio.UniGene.SequenceLine(text=None)

基类：object

存储Unigene文件中一条SEQUENCE行的信息。

用SEQUENCE行的文本部分初始化，或者什么都不初始化。

属性和描述（以大小写形式访问）：

ACC= 基因bank/MBE/DDBJ序列登录号
NID= 唯一核苷酸序列标识符（gi）
PID= 唯一的蛋白质序列标识符（用于非EST）
克隆= 克隆标识符（仅用于EST）
END= 克隆插入读取结束（5 '/3 '）（仅用于EST）
盖子= 图书馆ID;有关图书馆名称和纸巾，请参阅Hs.lib.info
MGC= 5'CDS-完整性指示物;如果存在，则认为与该序列相关的克隆是CDS-完整的。大于511的值是与EST匹配的CDS完整mRNA的gi，否则该值是指示CDS完整性的测试可靠性的指标;较高的值指示更可靠的CDS完整性预测。
序列号= 核苷酸序列的描述。可能的值是mRNA、EST和HTC。
TRACE= EST序列的Trace ID，由NCBI Trace Archive提供

__init__(text=None): 初始化课程。

__repr__(): 将UniGene SequenceLine对象作为字符串返回。

__firstlineno__ = 78

__static_attributes__ = ('acc', 'clone', 'end', 'image', 'is_image', 'lid', 'mgc', 'nid', 'pid', 'seqtype', 'text', 'trace')

class Bio.UniGene.ProtsimLine(text=None)

基类：object

存储Unigene文件中一条PROTSIM行的信息。

使用PROTSIM行的文本部分初始化，或者什么都不初始化。

属性和描述（以大小写形式访问）ORG= 有机物PROTGI= 蛋白质PRODID的序列GI = 蛋白质PCT序列ID = 对齐百分比ALN= 对齐区域的长度（aa）

__init__(text=None): 初始化课程。

__repr__(): 将UniGene ProtsimLine对象作为字符串返回。

__firstlineno__ = 137

__static_attributes__ = ('aln', 'org', 'pct', 'protgi', 'protid', 'text')

class Bio.UniGene.STSLine(text=None)

基类：object

存储Unigene文件中一条STS行的信息。

使用STS行的文本部分初始化，或者什么都不初始化。

属性和描述（以大小写形式访问）

ACC= STS的SEN/MBE/DDBJ登录号 [optional field] 第一次= NCBI UNISTS数据库中的标识符

__init__(text=None): 初始化课程。

__repr__(): 以字符串形式返回UniGene STSLine对象。

__firstlineno__ = 173

__static_attributes__ = ('acc', 'text', 'unists')

class Bio.UniGene.Record

基类：object

存储Unigene记录。

以下是存储的内容：：

self.ID           = ''  # ID line
self.species      = ''  # Hs, Bt, etc.
self.title        = ''  # TITLE line
self.symbol       = ''  # GENE line
self.cytoband     = ''  # CYTOBAND line
self.express      = []  # EXPRESS line, parsed on ';'
                        # Will be an array of strings
self.restr_expr   = ''  # RESTR_EXPR line
self.gnm_terminus = ''  # GNM_TERMINUS line
self.gene_id      = ''  # GENE_ID line
self.locuslink    = ''  # LOCUSLINK line
self.homol        = ''  # HOMOL line
self.chromosome   = ''  # CHROMOSOME line
self.protsim      = []  # PROTSIM entries, array of Protsims
                        # Type ProtsimLine
self.sequence     = []  # SEQUENCE entries, array of Sequence entries
                        # Type SequenceLine
self.sts          = []  # STS entries, array of STS entries
                        # Type STSLine
self.txmap        = []  # TXMAP entries, array of TXMap entries

__init__(): 初始化课程。

__repr__(): 将UniGene Record对象表示为字符串以进行调试。

__firstlineno__ = 204

__static_attributes__ = ('ID', 'chromosome', 'cytoband', 'express', 'gene_id', 'gnm_terminus', 'homol', 'locuslink', 'protsim', 'restr_expr', 'sequence', 'species', 'sts', 'symbol', 'title', 'txmap')

Bio.UniGene.parse(handle): 对于包含多个记录的文件，读取并加载UniGene记录。

Bio.UniGene.read(handle): 读取并加载UniGene记录，每个文件一条记录。