How to

Contents

pacbioToCA

[Summary]

pacbioToCA estimates genome_size, It's will get exact genome_size without genome_size parameter when there are only one or two SMRT cell. When amount of SMRT cell is bigger than 2, pacbioToCA with genome_size parameter will halp it get exact genome_size.


Random Get One

pacBioToCA -l viaMiseq -s pacbio.spec -t 10 -partitions 200 fastqFile=m120228_192221.fastq genomeSize=4650000 ../../tMiSeq_PE.frg

m120228_192221_42129_c100298890010000001523009207231260_s1_p0.fastq
seqs amount:38542
seq avg len:2322.679985
total:89.52 Mb
depth: 19.25X

(without genomeSize)PacBio_Illumia.fasta
seqs amount:34981
seq avg len:2133.783826
total:74.64 Mb
depth: 16.05X

4650000
(with genomeSize)viaMiseq.fasta
seqs amount:34852
seq avg len:2130.841559
total:74.26 Mb
depth: 15.97X => 看起來在只有一組的情況下,沒有差很多

Random Get Two

pacBioToCA -l viaMiseq -s pacbio.spec -t 10 -partitions 200 fastqFile=Filtered_two.fastq genomeSize=4650000 ../tMiSeq_PE.frg
Filtered_two.fastq
seqs amount:77117
seq avg len:2184.208709
total:168.44 Mb
depth: 36.22X

(without genomeSize)PacBio_Illumia.fasta
seqs amount:63760
seq avg len:2199.845561
total:140.26 Mb
depth: 30.16X

4650000
(with genomeSize)viaMiseq.fasta
seqs amount:63411
seq avg len:2198.455315
total:139.41 Mb
depth: 29.98X => 看起來在只有二組的情況下,沒有差很多

Random Get Three

pacBioToCA -l viaMiseq -s pacbio.spec -t 10 -partitions 200 fastqFile=Filtered_three.fastq genomeSize=4650000 ../tMiSeq_PE.frg

Filtered_three.fastq
seqs amount:113284
seq avg len:2333.977711
total:264.40 Mb
depth: 56.86X

(without genomeSize)PacBio_Illumia.fasta
seqs amount:98165
seq avg len:2286.482249
total:224.45 Mb
depth: 48.27X

4650000
(with genomeSize)viaMiseq.fasta
seqs amount:70468
seq avg len:2815.903020
total:198.43 Mb
depth: 42.67X => 看起來要有三組以上的的情況下,genomeSize才有效果。

Random Get Four

pacBioToCA -l viaMiseq -s pacbio.spec -t 10 -partitions 200 fastqFile=Filtered_four.fastq genomeSize=4650000 ../tMiSeq_PE.frg

Filtered_four.fastq
seqs amount:136333
seq avg len:2386.664674
total:325.38 Mb
depth: 69.97X

(without genomeSize)PacBio_Illumia.fasta
seqs amount:118901
seq avg len:2320.548322
total:275.92 Mb
depth: 59.34X

4650000
(with genomeSize)viaMiseq.fasta
seqs amount:56298
seq avg len:3495.604515
total:196.80 Mb
depth: 42.32X => 看起來要有三組以上的的情況下,genomeSize才有效果。

runCA

asm.spec used by us.

1

基本用法 runCA with asm.spec

   runCA -p asm -d asm -s asm.spec PBcR.viaMiseq.frg > asm.out 2>&1

2

web 提供,RunCA with parameters(Celera_Assembler_Parameters):

   runCA unitigger=bogart merSize=14 ovlMinLen=2000 utgErrorRate=0.015 utgGraphErrorRate=0.015 
   utgGraphErrorLimit=0 utgMergeErrorRate=0.03 utgMergeErrorLimit=0 -p asm -d asm asm.overCov.frg

3

Paper Script提供(asmCorrected.sh), RunCA with asm.spec and parameters: 照asmCorrected.sh上面的順序不會work,後來依照runCA 指令說明去放=> usage: runCA -d <dir> -p <prefix> [options] <frg> 就ok了。 並且使用wgs-package提供的asm.spec提供 D:\Boss Jade\201306\20130614_Hybrid assembly to_do_list\Filter_good_long_read\wgs-package\doc 但記得將grid 設0和 sge 關掉(mark掉)

   runCA -p asm -d asm -s asm.spec unitigger=bogart utgErrorRate=0.015 ovlMinLen=2000 ovlErrorRate=0.03 cgwErrorRate=0.10 cnsErrorRate=0.10 
   utgGraphErrorLimit=0 utgGraphErrorRate=0.015 utgMergeErrorLimit=0 utgMergeErrorRate=0.03 asm.overCov.frg