计算机业务

This tool performs a variety of operations on files produced by computeMatrix.

detailed help:

computeMatrixOperations info -h

or

computeMatrixOperations relabel -h

or

computeMatrixOperations subset -h

or

computeMatrixOperations filterStrand -h

or

computeMatrixOperations filterValues -h

or

computeMatrixOperations rbind -h

or

computeMatrixOperations cbind -h

or

computeMatrixOperations sort -h

or

computeMatrixOperations dataRange -h

usage: computeMatrixOperations [-h] [--version]  ...

Named Arguments

--version

show program's version number and exit

Commands

Possible choices: info, relabel, subset, filterStrand, filterValues, rbind, cbind, sort, dataRange

Sub-commands

info

Print group and sample information

An example usage is:
  computeMatrixOperations info -m input.mat.gz

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

relabel

Change sample and/or group label information

An example usage is:
  computeMatrixOperations relabel -m input.mat.gz -o output.mat.gz --sampleLabels "sample 1" "sample 2"

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

Optional arguments

--groupLabels

Groups labels. If none are specified then the current labels will be kept.

--sampleLabels

Sample labels. If none are specified then the current labels will be kept.

subset

Actually subset the matrix. The group and sample orders are honored, so one can also reorder files.

An example usage is:
  computeMatrixOperations subset -m input.mat.gz -o output.mat.gz --groups "group 1" "group 2" --samples "sample 3" "sample 10"

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

Optional arguments

--groups

Groups to include. If none are specified then all will be included.

--samples

Samples to include. If none are specified then all will be included.

filterStrand

Filter entries by strand.

Example usage:
  computeMatrixOperations filterStrand -m input.mat.gz -o output.mat.gz --strand +

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

--strand, -s

Possible choices: +, -, .

Strand

filterValues

Filter entries by min/max value.

Example usage:
  computeMatrixOperations filterValues -m input.mat.gz -o output.mat.gz --min 10 --max 1000

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

Optional arguments

--min

Minimum value. Any row having a single entry less than this will be excluded. The default is no minimum.

--max

Maximum value. Any row having a single entry more than this will be excluded. The default is no maximum.

rbind

merge multiple matrices by concatenating them head to tail. This assumes that the same samples are present in each in the same order.

Example usage:
  computeMatrixOperations rbind -m input1.mat.gz input2.mat.gz -o output.mat.gz

Required arguments

--matrixFile, -m

Matrix files from the computeMatrix tool.

--outFileName, -o

Output file name

cbind

merge multiple matrices by concatenating them left to right. No assumptions are made about the row order. Regions not present in the first file specified are ignored. Regions missing in subsequent files will result in NAs. Regions are matches based on the first 6 columns of the computeMatrix output (essentially the columns in a BED file).

Example usage:
  computeMatrixOperations cbind -m input1.mat.gz input2.mat.gz -o output.mat.gz

Required arguments

--matrixFile, -m

Matrix files from the computeMatrix tool.

--outFileName, -o

Output file name

sort

Sort a matrix file to correspond to the order of entries in the desired input file(s). The groups of regions designated by the files must be present in the order found in the output of computeMatrix (otherwise, use the subset command first). Note that this subcommand can also be used to remove unwanted regions, since regions not present in the input file(s) will be omitted from the output.

Example usage:
  computeMatrixOperations sort -m input.mat.gz -R regions1.bed regions2.bed regions3.gtf -o input.sorted.mat.gz

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

--outFileName, -o

Output file name

--regionsFileName, -R

File name(s), in BED or GTF format, containing the regions. If multiple bed files are given, each one is considered a group that can be plotted separately. Also, adding a "#" symbol in the bed file causes all the regions until the previous "#" to be considered one group. Alternatively for BED files, putting deepTools_group in the header can be used to indicate a column with group labels. Note that these should be sorted such that all group entries are together.

Optional arguments

--transcriptID

When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as transcripts. (Default: "transcript")

--transcript_id_designator

Each region has an ID (e.g., ACTB) assigned to it, which for BED files is either column 4 (if it exists) or the interval bounds. For GTF files this is instead stored in the last column as a key:value pair (e.g., as 'transcript_id "ACTB"', for a key of transcript_id and a value of ACTB). In some cases it can be convenient to use a different identifier. To do so, set this to the desired key. (Default: "transcript_id")

dataRange

Returns the min, max, median, 10th and 90th percentile of the matrix values per sample.

Example usage:
  computeMatrixOperations dataRange -m input.mat.gz

Required arguments

--matrixFile, -m

Matrix file from the computeMatrix tool.

example usages: computeMatrixOperations subset -m input.mat.gz -o output.mat.gz --group "group 1" "group 2" --samples "sample 3" "sample 10"

细节

computeMatrixOperations 可以对由生成的一个或多个文件执行各种操作 computeMatrix (注意,输出总是写入新文件):

Subcommand

它能做什么

信息

按显示顺序打印样本和区域组名称。

子集

根据所需的样本/区域组名称对文件进行子集。这也可以更改这些样本/区域组的顺序。

过滤链

过滤文件,使其仅包括注释为位于特定链上的区域。

重新绑定

将多个矩阵连接在一起,从上到下。

cBN

合并多个矩阵,从左到右。

分类

对给定文件进行排序,使区域按输入BED/GTF文件中出现的顺序排列。

当您希望在多个文件上运行ComputeMatrix(从而将所有值保持在一起)并稍后排除区域/样本或添加新的区域/样本时,这些操作非常有用。另一个常见的用法是,如果需要对ComputeMatrix的输出进行排序,以匹配输入文件中区域的顺序。

注意

从3.0版开始,ComputeMatrix(因此也包括ComputeMatrixOperations)生成输出,并为每个样本显示标签。如果您对旧版本输出的矩阵执行任何操作,那么它们将被修改为与新的输出一致,这是不向后兼容的!

实例

假设我们有一个特定于链的rnaseq数据集,并且希望只绘制跨拼接转录的特定于链的信号。一般步骤如下:

  1. bamCoverage 每个样品两次,一次 --filterRNAstrand forward 再次与 --filterRNAstrand reverse . 这将导致bigwig文件的数量是示例的两倍。

  2. computeMatrix scale-regions 所有这些重要文件,包括 --metagene 选项和BED12和/或GTF文件。这将生成一个文件,其中包含由每个脚本的链分隔的信号。

  3. 获取存储在矩阵文件中的示例名称列表:

$ computeMatrixOperations info -m foo.mat.gz
Groups:
    genes
Samples:
    SRR648667.forward
    SRR648668.forward
    SRR648669.forward
    SRR648670.forward
    SRR648667.reverse
    SRR648668.reverse
    SRR648669.reverse
    SRR648670.reverse
  1. 创建两个新文件,每个文件仅包含包含来自给定流的信号的样本。

$ computeMatrixOperations subset -m foo.mat.gz -o forward.mat.gz --samples SRR648667.forward SRR648668.forward SRR648669.forward SRR648670.forward
$ computeMatrixOperations subset -m foo.mat.gz -o reverse.mat.gz --samples SRR648667.reverse SRR648668.reverse SRR648669.reverse SRR648670.reverse
  1. 然后,这些文件可以是子集,以仅包含特定链上的文本。请注意,最好再次检查 --strand - 设置会产生预期的结果。RNASEQ库准备有许多特殊的变体,一种类型的设置可能不适合另一种类型(要检查这一点,请使用不同的 --strand 同一矩阵上的选项,然后运行 plotHeatmap 其中一个显然是正确的,另一个基本上是空白的)。

$ computeMatrixOperations filterStrand -m forward.mat.gz -o forward.subset.mat.gz --strand -
$ computeMatrixOperations filterStrand -m reverse.mat.gz -o reverse.subset.mat.gz --strand +
  1. 最后,这些文件可以从头到尾合并在一起。如步骤3所示,样本的顺序已经正确。

$ computeMatrixOperations rbind -m forward.subset.mat.gz reverse.subset.mat.gz -o merged.mat.gz
  1. 如果需要的话,可以使用文本来匹配输入gtf文件的顺序。

$ computeMatrixOperations sort -m merged.mat.gz -o sorted.mat.gz -R genes.gtf

结果文件随后可用于 plotHeatmapplotProfile . 注意,我们可以跳过子集步骤并运行 computeMatrix 独立于正向和反向的bigwig文件。

deepTools Galaxy <http://deeptools.ie-freiburg.mpg.de> _.

code @ github <https://github.com/deeptools/deepTools/> _.