RunCA

Revision as of 27 August 2013 20:30 by admin (Comments | Contribs)

We re-run correction and assembly with the data provided in PBcR closure project.

We corrected the long read sequence data (200X) with illumina short reads (100X), with or without specifying genome size.

pacBioToCA -l viaMiseq -s pacbio.spec  -t 10 -partitions 200 fastqFile=filtered_subreads.200X.fastq.bz2 miseq.100X.frg.bz2
pacBioToCA -l viaMiseq -s pacbio.spec  -t 10 -partitions 200 fastqFile=filtered_subreads.200X.fastq.bz2 genomeSize=4650000 miseq.100X.frg.bz2
200X filtered long reads Without genomeSize genomeSize=4650000
seqs amount:383482 seqs amount:332880 seqs amount:37879
seq avg len:2422.877720 seq avg len:2260.68262 seq avg len:4927.683492
total:929.13 Mb total:752.54 Mb total:186.66 Mb
depth: 199.81X depth: 161.84X depth: 40.14X

In addition to filter 25X PBcR for assembly, we used different Celera Assembler parameters as described in ref.

runCA -p asm -d asm -s asm.spec viaMiseq.frg
runCA unitigger=bogart merSize=14 ovlMinLen=<ovl value> utgErrorRate=0.015 utgGraphErrorRate=0.015 utgGraphErrorLimit=0 utgMergeErrorRate=0.03 utgMergeErrorLimit=0 -p asm -d asm viaMiseq.frg