PacBioToCA

Revision as of 20 August 2013 20:51 by admin (Comments | Contribs)

In running pacBioToCA, we found that the amount of PBcR was influenced by the parameter of genomeSize.

Short reads: 100X, long reads: 200X, w/wo genome size

pacBioToCA -l viaMiseq -s pacbio.spec  -t 10 -partitions 200 fastqFile=filtered_subreads.200X.fastq.bz2 miseq.100X.frg.bz2
pacBioToCA -l viaMiseq -s pacbio.spec  -t 10 -partitions 200 fastqFile=filtered_subreads.200X.fastq.bz2 genomeSize=4650000 miseq.100X.frg.bz2

Short reads: 118X, long reads: one ~ four SMRT cell reads, w/wo genome size


Name 200X filtered long reads m120228_192221 m120228_210845 Two SMRT cells Three SMRT cells Four SMRT cells
Filtered_subreads seqs amount:383482 seqs amount:38542 seqs amount:44794 seqs amount:77117 seqs amount:113284 seqs amount:136333
seq avg len:2422.877720 seq avg len:2322.679985 seq avg len:2334.414140 seq avg len:2184.208709 seq avg len:2333.977711 seq avg len:2386.664674
total:929.13 Mb total:89.52 Mb total:104.57 Mb total:168.44 Mb total:264.40 Mb total:325.38 Mb
depth: 199.81X depth: 19.25X depth: 22.49X depth: 36.22X depth: 56.86X depth: 69.97X
without genome size
seqs amount:332880 seqs amount:35199 seqs amount:40811 seqs amount:64201 seqs amount:99285 seqs amount:120296
seq avg len:2260.68262 seq avg len:2095.143186 seq avg len:2086.568670 seq avg len:2150.165184 seq avg len:2221.782394 seq avg len:2252.656963
total:752.54 Mb total:73.75 Mb total:85.15 Mb total:138.04 Mb total:220.59 Mb total:270.99 Mb
depth: 161.84X depth: 15.86X depth: 18.31X depth: 29.69X depth: 47.44X depth: 58.28X
genomeSize=4650000
seqs amount:37879 seqs amount:34852 seqs amount:40486 seqs amount:63411 seqs amount:70468 seqs amount:56298
seq avg len:4927.683492 seq avg len:2130.841559 seq avg len:2120.237712 seq avg len:2198.455315 seq avg len:2815.903020 seq avg len:3495.604515
total:186.66 Mb total:74.26 Mb total:85.84 Mb total:139.41 Mb total:198.43 Mb total:196.80 Mb
depth: 40.14X depth: 15.97X depth: 18.46X depth: 29.98X depth: 42.67X depth: 42.32X
runCA unitigger=bogart merSize=14 ovlMinLen= <ovl value> utgErrorRate=0.015 utgGraphErrorRate=0.015 utgGraphErrorLimit=0 utgMergeErrorRate=0.03 utgMergeErrorLimit=0 -p asm -d asm viaMiseq.frg

The <ovl value> parameter was set to approximately 40% of your average corrected sequence lengths (ref). As a general rule, if the average corrected length is less than 2.5Kbp, set it to 1000, if it is less than 3Kbp, set it to 1500, if it is less than 5.5Kbp, set it to 2000, if it is greater than 5.5Kbp, set it to 2500, and if it is greater than 6.5Kbp, set it to 3000.


The PBcR were filtered to 25X and then assembled by runCA.

genomeSize=4650000, 25X Two SMRT cells Three SMRT cells Four SMRT cells
seqs amount:40382 seqs amount:24448 seqs amount:21641
seq avg len:2878.787529 seq avg len:4754.996196 seq avg len:5371.762719
total:116.25 Mb total:116.25 Mb total:116.25 Mb
depth: 25.00X depth: 25.00X depth: 25.00X