对齐筛选

This tool filters alignments in a BAM/CRAM file according the the specified parameters. It can optionally output to BEDPE format.

usage: alignmentSieve -b sample1.bam -o sample1.filtered.bam --minMappingQuality 10 --filterMetrics log.txt
help: alignmentSieve -h / alignmentSieve --help

Required arguments

--bam, -b

An indexed BAM file.

--outFile, -o

The file to write results to. These are the alignments or fragments that pass the filtering criteria.

General arguments

--numberOfProcessors, -p

Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)

--filterMetrics

The number of entries in total and filtered are saved to this file

--filteredOutReads

If desired, all reads NOT passing the filtering criteria can be written to this file.

--label, -l

User defined label instead of the default label (file name).

--smartLabels

Instead of manually specifying a labels for the input file, this causes deepTools to use the file name after removing the path and extension.

--verbose, -v

Set to see processing messages.

--version

show program's version number and exit

--shift

Shift the left and right end of a read (for BAM files) or a fragment (for BED files). A positive value shift an end to the right (on the + strand) and a negative value shifts a fragment to the left. Either 2 or 4 integers can be provided. For example, "2 -3" will shift the left-most fragment end two bases to the right and the right-most end 3 bases to the left. If 4 integers are provided, then the first and last two refer to fragments whose read 1 is on the left or right, respectively. Consequently, it is possible to take strand into consideration for strand-specific protocols. A fragment whose length falls below 1 due to shifting will not be written to the output. See the online documentation for graphical examples. Note that non-properly-paired reads will be filtered.

--ATACshift

Shift the produced BAM file or BEDPE regions as commonly done for ATAC-seq. This is equivalent to --shift 4 -5 5 -4.

--genomeChunkLength

Size of the genome (in bps) to be processed per thread. (Default: 1000000)

Output arguments

--BED

Instead of producing BAM files, write output in BEDPE format (as defined by MACS2). Note that only reads/fragments passing filtering criterion are written in BEDPE format.

Optional arguments

--filterRNAstrand

Possible choices: forward, reverse

Selects RNA-seq reads (single-end or paired-end) in the given strand. (Default: None)

--ignoreDuplicates

If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read.

--minMappingQuality

If set, only reads that have a mapping quality score of at least this are considered.

--samFlagInclude

Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage.

--samFlagExclude

Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand.

--blackListFileName, -bl

A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant.

--minFragmentLength

The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)

--maxFragmentLength

The maximum fragment length needed for read/pair inclusion. A value of 0 indicates no limit. (Default: 0)

背景

此工具根据指定的参数筛选BAM/CRAM文件中的对齐。它可以有选择地输出到bedpe格式,可能是以自定义方式移动片段的结尾。

使用实例

alignmentSieve 需要已排序和索引的BAM文件以及所需的筛选条件。

$ alignmentSieve -b paired_chr2L.bam \
--minMappingQuality 5 --samFlagInclude 16 \
--samFlagExclude 256 --ignoreDuplicates \
-o filtered.bam --filterMetrics metrics.txt

然后,通过筛选条件的路线将写入由指定的文件 -o . 您还可以保存路线 NOT 通过筛选条件 -filteredOutReads 如果要存储有关看到的读取次数和筛选后剩余次数的度量,请使用 --filterMetrics . 度量文件示例如下:

#bamfilterreads--filtermetrics文件读取剩余的初始读取总数paired_chr2l.bam 8440 12644

可以生成一个bedpe文件(适合输入到macs2),而不是一个bam文件。与BAM/CRAM输出一样,BEDPE也允许片段端的移动,这在ATAC Seq和相关协议中通常是可取的:

$ alignmentSieve -b paired_chr2L.bam \
--minFragmentLength 140 --BED \
--shift -5 3 -o fragments.bedpe

这个 --shift 选项可以取2或4个整数。如果给定两个整数,则第一个值移动片段的最左端,第二个值移动片段的最右端。正值向右移动,负值向左移动。有关上述设置如何移动单个片段,请参见下面的内容:

     ----> read 1
                 read 2 <----

     ------------------------ fragment

-------------------------------- shifted fragment

如果交换读取1和读取2,将产生相同的结果。相反,如果协议是特定于链的,那么一对中的第一组整数将应用于读1先于读2的片段,而第二组整数将应用于读2先于读1的片段。在这种情况下,每对中的第一个值应用于读取1的结尾,第二个值应用于读取2的结尾。以下面的命令为例:

$ alignmentSieve -b paired_chr2L.bam \
--minFragmentLength 140 --BED \
--shift -5 3 -1 4 -o fragments.bedpe

鉴于此, -5 3 集合将生成以下内容:

     ----> read 1
                 read 2 <----

     ------------------------ fragment

-------------------------------- shifted fragment

以及 -1 4 集合将生成以下内容:

----> read 2
            read 1 <----

------------------------ fragment

    --------------------- shifted fragment

可以看出,这些碎片被认为是 - 然后,负值在其参照系上向左移动(因此,相对于 + 股)。

备注

如果 --shift--ATACshift 使用选项,则只使用正确配对的读取。

deepTools Galaxy <http://deeptools.ie-freiburg.mpg.de> _.

code @ github <https://github.com/deeptools/deepTools/> _.