Bampe碎片大小
This tool calculates the fragment sizes for read pairs given a BAM file from paired-end sequencing.Several regions are sampled depending on the size of the genome and number of processors to estimate thesummary statistics on the fragment lengths. Properly paired reads are preferred for computation, i.e., it will only use discordant pairs if no concordant alignments overlap with a given region. The default setting simply prints the summary statistics to the screen.
usage: bamPEFragmentSize -b sample1.bam sample2.bam -o hist.png
help: bamPEFragmentSize -h / bamPEFragmentSize --help
Named Arguments
- --bamfiles, -b
List of BAM files to process
- --histogram, -hist, -o
Save a .png file with a histogram of the fragment length distribution.
- --plotFileFormat
Possible choices: png, pdf, svg, eps, plotly
Image format type. If given, this option overrides the image format based on the plotFile ending. The available options are: png, eps, pdf, svg and plotly.
- --numberOfProcessors, -p
Number of processors to use. The default is to use 1. (Default: 1)
- --samplesLabel
Labels for the samples plotted. The default is to use the file name of the sample. The sample labels should be separated by spaces and quoted if a label itselfcontains a space E.g. --samplesLabel label-1 "label 2"
- --plotTitle, -T
Title of the plot, to be printed on top of the generated image. Leave blank for no title. (Default: )
- --maxFragmentLength
The maximum fragment length in the histogram. A value of 0 (the default) indicates to use twice the mean fragment length. (Default: 0)
- --logScale
Plot on the log scale
- --binSize, -bs
Length in bases of the window used to sample the genome. (Default: 1000)
- --distanceBetweenBins, -n
To reduce the computation time, not every possible genomic bin is sampled. This option allows you to set the distance between bins actually sampled from. Larger numbers are sufficient for high coverage samples, while smaller values are useful for lower coverage samples. Note that if you specify a value that results in too few (<1000) reads sampled, the value will be decreased. (Default: 1000000)
- --blackListFileName, -bl
A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered.
- --table
In addition to printing read and fragment length metrics to the screen, write them to the given file in tabular format.
- --outRawFragmentLengths
Save the fragment (or read if the input is single-end) length and their associated number of occurrences to a tab-separated file. Columns are length, number of occurrences, and the sample label.
- --verbose
Set if processing data messages are wanted.
- --version
show program's version number and exit
示例用法
$ deepTools2.0/bin/bamPEFragmentSize \
-hist fragmentSize.png \
-T "Fragment size of PE RNA-seq data" \
--maxFragmentLength 1000 \
-b testFiles/RNAseq_sample1.bam testFiles/RNAseq_sample2.bam \
testFiles/RNAseq_sample3.bam testFiles/RNAseq_sample4.bam \
-samplesLabel sample1 sample2 sample3 sample4
##输出
BAM file : testFiles/RNAseq_sample1.bam
Sample size: 10815
Fragment lengths:
Min.: 0.0
1st Qu.: 311.0
Mean: 8960.68987517
Median: 331.0
3rd Qu.: 362.0
Max.: 53574842.0
Std: 572421.46625
Read lengths:
Min.: 20.0
1st Qu.: 101.0
Mean: 99.1621821544
Median: 101.0
3rd Qu.: 101.0
Max.: 101.0
Std: 9.16567362755
BAM file : testFiles/RNAseq_sample2.bam
Sample size: 6771
Fragment lengths:
Min.: 43.0
1st Qu.: 148.0
Mean: 176.465071629
Median: 164.0
3rd Qu.: 185.0
Max.: 500.0
Std: 53.733877263
......(output truncated)

如果 --table
指定选项后,将以表格格式另外打印摘要统计信息:
Frag. Len. Min. Frag. Len. 1st. Qu. Frag. Len. Mean Frag. Len. Median Frag. Len. 3rd Qu. Frag. Len. Max Frag. Len. Std. Read Len. Min. Read Len. 1st. Qu. Read Len. Mean Read Len. Median Read Len. 3rd Qu. Read Len. Max Read Len. Std.
bowtie2 test1.bam 241.0 241.5 244.666666667 242.0 246.5 251.0 4.49691252108 251.0 251.0 251.0 251.0 251.0 251.0 0.0
如果 --outRawFragmentLengths
提供选项后,将生成另一个历史记录项,其中包含柱状图基础的原始数据。格式如下:
#bamPEFragmentSize
Size Occurrences Sample
241 1 bowtie2 test1.bam
242 1 bowtie2 test1.bam
251 1 bowtie2 test1.bam
“大小”是片段(或单端数据集的读取)大小,“出现次数”是观察到该长度的读取/片段的次数。为了便于下游处理,示例名称是每行中包含的LSO。
deepTools Galaxy <http://deeptools.ie-freiburg.mpg.de> _. |
code @ github <https://github.com/deeptools/deepTools/> _. |