Wgs-8.2

We used the latest Celera Assembler, PBcR, to do hybrid assembly with different numbers of SMRT cells from Dataset 5 and Dataset 4 short reads.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

1. Generate short reads FRG file

fastqToCA -libraryname ecoli -technology illumina -insertsize 300 30 -mates read_1.fastq,read_2.fastq > short_reads.frg

2. Create pacbio.spec file

merSize=14

3. PBcR

PBcR -length 500 -partitions 200 -l eclo-illumina -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=4650000 short_reads.frg

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail

Statistics without reference PBcR_1cell PBcR_2cell PBcR_3cell PBcR_4cell PBcR_17cell
# contigs 24 9 6 2 3
Largest contig 699206 2229824 3385118 4461262 4649343
Total length 4686657 4685972 4678733 4653153 4674706
N50 564692 981448 3385118 4461262 4649343
Misassemblies
# misassemblies 57 11 10 9 11
Misassembled contigs length 1091536 3501620 4397319 4461262 4661093
Mismatches
# mismatches per 100 kbp 2.26 1.64 0.91 2.07 2
# indels per 100 kbp 0.63 0.32 0.24 0.3 0.47
# N's per 100 kbp 0 0 0 0 0
Genome statistics
Genome fraction (%) |99.922 100 100 100 100
Duplication ratio 1.011 1.01 1.008 1.003 1.008
# genes 4482 + 13 part 4495 + 2 part 4495 + 2 part 4494 + 3 part 4495 + 2 part
NGA50 293226 947955 949288 3026415 3026412
Running Time 6hr 8m 9hr 52m 10hr 24m 11hr 57m 12hr 52m