RunCA

Revision as of 28 August 2013 00:45 by admin (Comments | Contribs)

(diff) ← Previous revision | Current revision | Next revision → (diff)

We re-run correction and assembly with the data provided in PBcR closure project.

We corrected the long read sequence data (200X) with illumina short reads (100X), with or without specifying genome size.

pacBioToCA -l viaMiseq -s pacbio.spec  -t 10 -partitions 200 fastqFile=filtered_subreads.200X.fastq.bz2 miseq.100X.frg.bz2

pacBioToCA -l viaMiseq -s pacbio.spec  -t 10 -partitions 200 fastqFile=filtered_subreads.200X.fastq.bz2 genomeSize=4650000 miseq.100X.frg.bz2

200X filtered long reads	Without genomeSize	genomeSize=4650000
seqs amount:383482	seqs amount:332880	seqs amount:37879
seq avg len:2422.877720	seq avg len:2260.68262	seq avg len:4927.683492
total:929.13 Mb	total:752.54 Mb	total:186.66 Mb
depth: 199.81X	depth: 161.84X	depth: 40.14X

In addition to filter 25X PBcR or no filter for assembly, we used different Celera Assembler parameters as described in ref.

runCA -p asm -d asm -s asm.spec viaMiseq.frg

runCA unitigger=bogart merSize=14 ovlMinLen=<ovl value> utgErrorRate=0.015 utgGraphErrorRate=0.015 utgGraphErrorLimit=0 utgMergeErrorRate=0.03 utgMergeErrorLimit=0 -p asm -d asm viaMiseq.frg

We therefore had eight assemblies for comparison.

without genomeSize	all PBcR	runCA1	wo_all_runCA1
		runCA2	wo_all_runCA2
	25X PBcR	runCA1	wo_25X_runCA1
		runCA2	wo_25X_runCA2
genomeSize=4650000	all PBcR	runCA1	w_all_runCA1
		runCA2	w_all_runCA2
	25X PBcR	runCA1	w_25X_runCA1
		runCA2	w_25X_runCA2