SPAdes - hybrid

We used different numbers of SMRT cells from Dataset 5 and Dataset 9 to do assembly with all and randomly selected 76X coverage Dataset 4 short reads by SPAdes.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

spades.py -1 reads_1.fastq -2 reads_2.fastq --pacbio filtered_subreads.fasta -o output

Evaluation

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail

Statistics without reference Miseq_only Miseq_1cell Miseq_2cell Miseq_3cell Miseq_4cell Miseq_17cell Miseq_d9
# contigs 86 15 14 12 11 6 6
Largest contig 285889 2498709 3196491 1522255 2216997 4644452 4644455
Total length 4577132 4626443 4639784 4651072 4649795 4652420 4652423
N50 139882 2498709 3196491 1241619 1750947 4644452 4644455
Misassemblies
# misassemblies 2 6 6 7 7 7 7
Misassembled contigs length 215581 2498709 3196491 2501706 3967944 4644452 4644455
Mismatches
# mismatches per 100 kbp 3.02 5.16 6.16 6.79 6.81 6.79 6.79
# indels per 100 kbp 0.46 0.54 0.54 0.58 0.58 0.67 0.58
# N's per 100 kbp 0 0 0 0 0 0 0
Genome statistics
Genome fraction (%) 98.451 99.411 99.756 100 99.976 100 100
Duplication ratio 1.001 1.001 1.001 1.001 1.001 1.001 1.001
# genes 4399 + 32 part 4479 + 3 part 4485 + 4 part 4492 + 5 part 4492 + 5 part 4495 + 2 part 4495 + 2 part
NGA50 133059 692561 1041186 889325 1278976 3026419 3026422
Running Time 2hr 28m 2hr 34m 2hr 36m 2hr 40m 2hr 35m 2hr 46m 2hr 33m

Misassemblies for Adobe reader.

Randomly selected 76X coverage short reads

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). (more detail)

Statistics without reference Miseq_only Miseq_1cell Miseq_2cell Miseq_3cell Miseq_4cell Miseq_17cell Miseq_d9
# contigs 86 23 18 19 15 15 15
Largest contig 285889 2498544 3195523 2501177 3893601 3893600 3893600
Total length 4577132 4624001 4630238 4637837 4635269 4635268 4635268
N50 139882 2498544 3195523 2501177 3893601 3893600 3893600
Misassemblies
# misassemblies 2 6 6 6 6 6 6
Misassembled contigs length 215581 2498544 3195523 2501177 3893601 3893600 3893600
Mismatches
# mismatches per 100 kbp 3.02 8.83 9.25 9.62 9.45 9.45 9.47
# indels per 100 kbp 0.46 0.48 0.48 0.43 0.48 0.5 0.5
# N's per 100 kbp 0 0 0 0 0 0 0
Genome statistics
Genome fraction (%) 98.451 99.382 99.531 99.696 99.645 99.645 99.645
Duplication ratio 1.001 1.001 1.001 1.001 1.001 1.001 1.001
# genes 4399 + 32 part 4479 + 2 part 4474 + 3 part 4475 + 2 part 4482 + 2 part 4479 + 2 part 4479 + 2 part
NGA50 133059 691983 1040999 693363 1040814 1040814 1040814
Running Time 2hr 28m 52m 30s 49m 54s 54m 55s 57m 49s 1hr 20m 1hr 4m