We did assembly by SPAdes with Dataset 4 raw data and used different subreads depths of Dataset 5 and Dataset 9 to scaffold by SSPACE-longread.
We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630
We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list).
Statistics without reference | Miseq_only | Miseq_1cell | Miseq_2cell | Miseq_3cell | Miseq_4cell | Miseq_17cell | Miseq_d9 |
# contigs | 86 | 15 | 18 | 16 | 15 | 17 | 14 |
Largest contig | 285889 | 2497845 | 1260980 | 2501081 | 3194637 | 1954649 | 3392211 |
Total length | 4577132 | 4632009 | 4633058 | 4636174 | 4638657 | 4633857 | 4632677 |
N50 | 139882 | 2497845 | 1238868 | 2501081 | 3194637 | 1238635 | 3392211 |
Misassemblies | |||||||
# misassemblies | 2 | 9 | 10 | 10 | 10 | 8 | 8 |
Misassembled contigs length | 215581 | 3193893 | 3244631 | 2705788 | 3657574 | 3243566 | 4050696 |
Mismatches | |||||||
# mismatches per 100 kbp | 3.02 | 7.15 | 6.2 | 6.9 | 7.17 | 6.05 | 6.43 |
# indels per 100 kbp | 0.46 | 1.06 | 1 | 1.32 | 1.27 | 0.95 | 1.06 |
# N's per 100 kbp | 0 | 97.89 | 67.67 | 77.33 | 77.37 | 91.03 | 123.88 |
Genome statistics | |||||||
Genome fraction (%) | 98.451 | 99.498 | 99.483 | 99.664 | 99.748 | 99.432 | 99.587 |
Duplication ratio | 1.001 | 1.002 | 1.002 | 1.002 | 1.002 | 1.003 | 1.001 |
# genes | 4399 + 32 part | 4467 + 14 part | 4465 + 13 part | 4476 + 11 part | 4477 + 11 part | 4467 + 11 part | 4470 + 13 part |
NGA50 | 133059 | 571664 | 425173 | 852639 | 1039467 | 1039472 | 1039654 |
We discarded the contigs which fewer than 100 reads aligned. more detail
Statistics without reference | 071634_raw_asm.ctg | 192221_raw_asm.ctg | 210845_raw_asm.ctg | 071634_100X_asm.ctg | 071634_118X_asm.ctg | 192221_118X_asm.ctg | 210845_118X_asm.ctg |
# contigs | 19 | 24 | 21 | 28 | 38 | 29 | 31 |
Largest contig | 745120 | 664876 | 592203 | 663399 | 434084 | 345313 | 437164 |
Total length | 4669108 | 4675696 | 4700617 | 4636263 | 4644391 | 4603072 | 4578972 |
N50 | 356974 | 222559 | 399011 | 295449 | 180706 | 207976 | 191458 |
Misassemblies | |||||||
# misassemblies | 7 | 6 | 11 | 6 | 5 | 7 | 6 |
Misassembled contigs length | 1539749 | 936587 | 2058922 | 1200212 | 727024 | 1097466 | 478971 |
Mismatches | |||||||
# mismatches per 100 kbp | 2.75 | 2.75 | 3.04 | 7.08 | 5.85 | 8.82 | 3.69 |
# indels per 100 kbp | 2.23 | 1.1 | 1.17 | 13.46 | 5.83 | 2.49 | 2.37 |
# N's per 100 kbp | 0.19 | 0.02 | 0.04 | 0.26 | 0.15 | 0.07 | 0.02 |
Genome statistics | |||||||
Genome fraction (%) | 99.639 | 99.699 | 99.834 | 99.984 | 99.051 | 98.78 | 98.159 |
Duplication ratio | 1.011 | 1.011 | 1.017 | 1.01 | 1.015 | 1.005 | 1.006 |
# genes | 4473 + 15 part | 4465 + 18 part | 4480 + 10 part | 4435 + 29 part | 4431 + 36 part | 4413 + 36 part | 4380 + 34 part |
NGA50 | 357183 | 221098 | 279423 | 226118 | 179662 | 194634 | 191457 |