We used the latest Celera Assembler, PBcR, to do hybrid assembly with different numbers of SMRT cells from Dataset 5 and Dataset 4 short reads.
We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630
1. Generate short reads FRG file
fastqToCA -libraryname ecoli -technology illumina -insertsize 300 30 -mates read_1.fastq,read_2.fastq > short_reads.frg
2. Create pacbio.spec file
merSize=14
3. PBcR
PBcR -length 500 -partitions 200 -l eclo-illumina -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=4650000 short_reads.frg
We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail
Statistics without reference | PBcR_1cell | PBcR_2cell | PBcR_3cell | PBcR_4cell | PBcR_17cell |
# contigs | 24 | 9 | 6 | 2 | 3 |
Largest contig | 699206 | 2229824 | 3385118 | 4461262 | 4649343 |
Total length | 4686657 | 4685972 | 4678733 | 4653153 | 4674706 |
N50 | 564692 | 981448 | 3385118 | 4461262 | 4649343 |
Misassemblies | |||||
# misassemblies | 57 | 11 | 10 | 9 | 11 |
Misassembled contigs length | 1091536 | 3501620 | 4397319 | 4461262 | 4661093 |
Mismatches | |||||
# mismatches per 100 kbp | 2.26 | 1.64 | 0.91 | 2.07 | 2 |
# indels per 100 kbp | 0.63 | 0.32 | 0.24 | 0.3 | 0.47 |
# N's per 100 kbp | 0 | 0 | 0 | 0 | 0 |
Genome statistics | |||||
Genome fraction (%) | |99.922 | 100 | 100 | 100 | 100 |
Duplication ratio | 1.011 | 1.01 | 1.008 | 1.003 | 1.008 |
# genes | 4482 + 13 part | 4495 + 2 part | 4495 + 2 part | 4494 + 3 part | 4495 + 2 part |
NGA50 | 293226 | 947955 | 949288 | 3026415 | 3026412 |
Running Time | 6hr 8m | 9hr 52m | 10hr 24m | 11hr 57m | 12hr 52m |