We did assembly by SPAdes with Dataset 4 raw data and used Dataset 5, 1-4 SMRT cell subreads to scaffold by SSPACE-longread.
We arbitrary chose 1-4 SMRT cells: Three single SMRT cell: m120208_071634 Two SMRT cells: m120228_210845 + m120208_122534 Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807 Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630
We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list).
Single SMRT cell reads were corrected with raw, 100X and 118X short reads.
Statistics without reference | 071634_raw_asm.ctg | 192221_raw_asm.ctg | 210845_raw_asm.ctg | 071634_100X_asm.ctg | 071634_118X_asm.ctg | 192221_118X_asm.ctg | 210845_118X_asm.ctg |
# contigs | 80 | 93 | 83 | 61 | 69 | 68 | 66 |
Largest contig | 745120 | 664876 | 562203 | 663399 | 434084 | 345313 | 437164 |
Total length | 4975695 | 5031560 | 5043217 | 4804004 | 4805579 | 4801310 | 4733683 |
N50 | 356974 | 221472 | 324225 | 295449 | 179662 | 207976 | 186993 |
Misassemblies | |||||||
# misassemblies | 11 | 17 | 21 | 10 | 13 | 20 | 15 |
Misassembled contigs length | 1552524 | 976207 | 2108892 | 1222277 | 782726 | 1156917 | 527873 |
Mismatches | |||||||
# mismatches per 100 kbp | 3.32 | 2.91 | 3.06 | 7.08 | 6.4 | 10.33 | 4.13 |
# indels per 100 kbp | 2.98 | 1.38 | 1.01 | 13.15 | 5.2 | 5.54 | 2.69 |
# N's per 100 kbp | 0.38 | 0.12 | 0.22 | 0.4 | 0.37 | 0.4 | 0.23 |
Genome statistics | |||||||
Genome fraction (%) | 99.97 | 100 | 100 | 99.304 | 99.424 | 99.522 | 98.712 |
Duplication ratio | 1.074 | 1.086 | 1.090 | 1.043 | 1.047 | 1.04 | 1.033 |
# genes | 4489 + 7 part | 4490 + 7 part | 4495 + 2 part | 4461 + 25 part | 4451 + 31 part | 4459 + 28 part | 4412 + 32 part |
NGA50 | 357183 | 221098 | 279423 | 226118 | 179662 | 194634 | 191457 |
We discarded the contigs which fewer than 100 reads aligned. more detail
Statistics without reference | 071634_raw_asm.ctg | 192221_raw_asm.ctg | 210845_raw_asm.ctg | 071634_100X_asm.ctg | 071634_118X_asm.ctg | 192221_118X_asm.ctg | 210845_118X_asm.ctg |
# contigs | 19 | 24 | 21 | 28 | 38 | 29 | 31 |
Largest contig | 745120 | 664876 | 592203 | 663399 | 434084 | 345313 | 437164 |
Total length | 4669108 | 4675696 | 4700617 | 4636263 | 4644391 | 4603072 | 4578972 |
N50 | 356974 | 222559 | 399011 | 295449 | 180706 | 207976 | 191458 |
Misassemblies | |||||||
# misassemblies | 7 | 6 | 11 | 6 | 5 | 7 | 6 |
Misassembled contigs length | 1539749 | 936587 | 2058922 | 1200212 | 727024 | 1097466 | 478971 |
Mismatches | |||||||
# mismatches per 100 kbp | 2.75 | 2.75 | 3.04 | 7.08 | 5.85 | 8.82 | 3.69 |
# indels per 100 kbp | 2.23 | 1.1 | 1.17 | 13.46 | 5.83 | 2.49 | 2.37 |
# N's per 100 kbp | 0.19 | 0.02 | 0.04 | 0.26 | 0.15 | 0.07 | 0.02 |
Genome statistics | |||||||
Genome fraction (%) | 99.639 | 99.699 | 99.834 | 99.984 | 99.051 | 98.78 | 98.159 |
Duplication ratio | 1.011 | 1.011 | 1.017 | 1.01 | 1.015 | 1.005 | 1.006 |
# genes | 4473 + 15 part | 4465 + 18 part | 4480 + 10 part | 4435 + 29 part | 4431 + 36 part | 4413 + 36 part | 4380 + 34 part |
NGA50 | 357183 | 221098 | 279423 | 226118 | 179662 | 194634 | 191457 |