skbio.alignment.TabularMSA.join¶

TabularMSA.join(other, how='strict')[源代码]¶

按顺序（水平）将此MSA与另一个MSA连接。

序列将由索引标签连接。MSA公司 positional_metadata 将由列连接。使用 how 控制连接行为。

对齐方式是 not 在联接操作期间重新计算（请参见笔记详细信息）。

参数:

other (TabularMSA) -- 要加入的MSA。必须有相同的 dtype 作为这个MSA。
how ({'strict', 'inner', 'outer', 'left', 'right'}, optional) -- 如何连接序列和MSA positional_metadata ： * 'strict': MSA indexes and positional_metadata columns must match * 'inner' ：MSA索引和 positional_metadata 列（仅使用索引标签和列的共享集） * 'outer': an outer-join of the MSA indexes and positional_metadata columns (all index labels and columns are used). Unshared sequences will be padded with the MSA's default gap character (TabularMSA.dtype.default_gap_char). Unshared columns will be padded with NaN. * 'left' ：MSA索引和 positional_metadata 列（使用此MSA的索引标签和列）。非共享数据的填充处理方式与 'outer' . * 'right' ：MSA索引和 positional_metadata 柱 (other 使用索引标签和列）。非共享数据的填充处理方式与 'outer' .

返回:

加入MSA。它的索引（调用 sort 定义一个）。

返回类型:

TabularMSA

抛出:

ValueError -- 如果 how 无效。
ValueError -- 如果此MSA的索引或 other 包含重复项。
ValueError -- 如果 how='strict' 而且这个MSA的索引与 other .
ValueError -- 如果 how='strict' 这个MSA positional_metadata 列与不匹配 other .
TypeError -- 如果 other 不是的子类 TabularMSA .
TypeError -- 如果 dtype 属于 other 与此MSA不匹配 dtype .

参见

extend, sort, skbio.sequence.Sequence.concat

备注

不会简单地将序列连接在一起；不会自动执行对齐操作。因此，这项行动本身并不一定有意义。

此MSA的索引标签必须是唯一的。同样，索引标签 other 必须是独一无二的。

MSA范围和每个序列的元数据 (TabularMSA.metadata 和 Sequence.metadata )不保留在连接上 TabularMSA .

序列的位置元数据将被外部联接，而不管 how （使用） Sequence.concat(how='outer') ）

如果join操作导致 TabularMSA 没有任何序列，MSA的 positional_metadata 不会被设置。

示例

备注

以下示例调用 .sort() 因为索引的顺序没有保证。在这些示例中，对连接的MSA进行排序，以使输出可再现。当对您自己的数据使用此方法时，不需要对连接的MSA进行排序。

按顺序加入MSAs：

>>> from skbio import DNA, TabularMSA
>>> msa1 = TabularMSA([DNA('AC'),
...                    DNA('A-')])
>>> msa2 = TabularMSA([DNA('G-T'),
...                    DNA('T--')])
>>> joined = msa1.join(msa2)
>>> joined.sort()  # unnecessary in practice, see note above
>>> joined
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 2
    position count: 5
---------------------
ACG-T
A-T--

序列根据MSA索引标签连接：

>>> msa1 = TabularMSA([DNA('AC'),
...                    DNA('A-')], index=['a', 'b'])
>>> msa2 = TabularMSA([DNA('G-T'),
...                    DNA('T--')], index=['b', 'a'])
>>> joined = msa1.join(msa2)
>>> joined.sort()  # unnecessary in practice, see note above
>>> joined
TabularMSA[DNA]
---------------------
Stats:
    sequence count: 2
    position count: 5
---------------------
ACT--
A-G-T
>>> joined.index
Index(['a', 'b'], dtype='object')

默认情况下，两个MSA索引必须匹配。使用 how 要指定内部联接，请执行以下操作：

>>> msa1 = TabularMSA([DNA('AC'),
...                    DNA('A-'),
...                    DNA('-C')], index=['a', 'b', 'c'],
...                   positional_metadata={'col1': [42, 43],
...                                        'col2': [1, 2]})
>>> msa2 = TabularMSA([DNA('G-T'),
...                    DNA('T--'),
...                    DNA('ACG')], index=['b', 'a', 'z'],
...                   positional_metadata={'col2': [3, 4, 5],
...                                        'col3': ['f', 'o', 'o']})
>>> joined = msa1.join(msa2, how='inner')
>>> joined.sort()  # unnecessary in practice, see note above
>>> joined
TabularMSA[DNA]
--------------------------
Positional metadata:
    'col2': <dtype: int64>
Stats:
    sequence count: 2
    position count: 5
--------------------------
ACT--
A-G-T
>>> joined.index
Index(['a', 'b'], dtype='object')
>>> joined.positional_metadata
   col2
0     1
1     2
2     3
3     4
4     5

执行外部连接时 ('outer' ， 'left' 或 'right' )，非共享序列用间隙和非共享填充 positional_metadata 列用NaN填充：

>>> joined = msa1.join(msa2, how='outer')
>>> joined.sort()  # unnecessary in practice, see note above
>>> joined
TabularMSA[DNA]
----------------------------
Positional metadata:
    'col1': <dtype: float64>
    'col2': <dtype: int64>
    'col3': <dtype: object>
Stats:
    sequence count: 4
    position count: 5
----------------------------
ACT--
A-G-T
-C---
--ACG
>>> joined.index
Index(['a', 'b', 'c', 'z'], dtype='object')
>>> joined.positional_metadata
   col1  col2 col3
0  42.0     1  NaN
1  43.0     2  NaN
2   NaN     3    f
3   NaN     4    o
4   NaN     5    o