纠正偏差

提示

有关GC偏差评估和更正的背景信息,请参见 计算能力 .

This tool corrects the GC-bias using the method proposed by [Benjamini & Speed (2012). Nucleic Acids Research, 40(10)]. It will remove reads from regions with too high coverage compared to the expected values (typically GC-rich regions) and will add reads to regions where too few reads are seen (typically AT-rich regions). The tool computeGCBias needs to be run first to generate the frequency table needed here.

usage: correctGCBias -b file.bam --effectiveGenomeSize 2150570000 -g mm9.2bit --GCbiasFrequenciesFile freq.txt -o gc_corrected.bam
help: correctGCBias -h / correctGCBias --help

Required arguments

--bamfile, -b

Sorted BAM file to correct.

--effectiveGenomeSize

The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. A table of values is available here: http://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html .

--genome, -g

Genome in two bit format. Most genomes can be found here: http://hgdownload.cse.ucsc.edu/gbdb/ Search for the .2bit ending. Otherwise, fasta files can be converted to 2bit using faToTwoBit available here: http://hgdownload.cse.ucsc.edu/admin/exe/

--GCbiasFrequenciesFile, -freq

Indicate the output file from computeGCBias containing the observed and expected read frequencies per GC-content.

Output options

--correctedFile, -o

Name of the corrected file. The ending will be used to decide the output file format. The options are ".bam", ".bw" for a bigWig file, ".bg" for a bedGraph file.

Optional arguments

--version

show program's version number and exit

--binSize, -bs

Size of the bins, in bases, for the output of the bigwig/bedgraph file. (Default: 50)

--region, -r

Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000.

--numberOfProcessors, -p

Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)

--verbose, -v

Set to see processing messages.

使用实例

备注

correctGCBias 需要输出 computeGCBias 以及一个2bit格式的基因组文件。大多数基因组都可以在这里找到:http://hgdownload.cse.ucsc.edu/gdb/。搜索 .2bit 结束。否则,可以使用 faToTwoBit ,网址:http://hgdownload.cse.ucsc.edu/admin/exe/

$ correctGCBias -b H3K27Me3.bam
   --effectiveGenomeSize 2695000000
   --genome genome.2bit
   --GCbiasFrequenciesFile freq_test.txt # output of computeGCBias
   -o gc_corrected.bam

警告

gc更正的BAM文件很可能在覆盖率必须增加以匹配预期读取密度的区域中包含多个重复读取。这意味着您应该绝对避免在下游分析期间使用任何重复读取的过滤!

deepTools Galaxy <http://deeptools.ie-freiburg.mpg.de> _.

code @ github <https://github.com/deeptools/deepTools/> _.