Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).
Contents |
---|
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913).
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 2 | 8 | 10 | 14 | 1 | 1 | 4 |
Largest contig | 4 278 957 | 2 277 010 | 1 213 670 | 984 459 | 4 641 350 | 4 640 250 | 3 162 440 |
Total length | 4 650 771 | 4 648 304 | 4 644 602 | 4 656 274 | 4 641 350 | 4 640 250 | 4 653 394 |
N50 | 4 278 957 | 2 043 590 | 2 044 147 | 2 135 225 | 3 162 440 | 4 640 250 | 4 641 350 |
Misassemblies | |||||||
# misassemblies | 8 | 10 | 8 | 6 | 7 | 7 | 8 |
Misassembled contigs length | 4 278 957 | 2 809 129 | 2 085 482 | 1 947 163 | 4 641 350 | 4 640 250 | 3 209 090 |
Mismatches | |||||||
# mismatches per 100kbp | 0.37 | 2.49 | 1.88 | 5.38 | 0.69 | 0.67 | 0.86 |
# indels per 100kbp | 3.64 | 56.81 | 47.62 | 77.31 | 10.67 | 12.87 | 11 |
# N's per 100kbp | 0 | 0.04 | 0.02 | 0.09 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 99.93 | 99.733 | 99.67 | 99.693 | 99.972 | 99.946 | 99.968 |
Duplication ratio | 1.003 | 1.006 | 1.005 | 1.008 | 1.001 | 1.001 | 1.005 |
# genes | 4492+5 part | 4475+10 part | 4467+12 part | 4469+13 part | 4492+4 part | 4491+4 part | 4492+4 part |
NGA50 | 1 207 233 | 531 351 | 721 189 | 565 251 | 2 499 057 | 2 499 697 | 1 267 262 |
Running Time | 15hr 41m | 7hr 32m | 7hr 10m | 5hr 42m | 15hr 44m | 16hr 02m | 13hr 27m |
We used all SMRT cells to do assembly and and evaluated the assemblies by QUAST against the reference genome (NC_013946).
Statistics without reference | All Data |
# contigs | 2 |
Largest contig | 2 974 307 |
Total length | 3 100 289 |
N50 | 2 974 307 |
Misassemblies | |
# misassemblies | 3 |
Misassembled contigs length | 2 974 307 |
Mismatches | |
# mismatches per 100kbp | 0.23 |
# indels per 100kbp | 5.04 |
# N's per 100kbp | 0.03 |
Genome Statistics | |
Genome fraction(%) | 99.883 |
Duplication ratio | 1.002 |
# genes | 3093+4 part |
NGA50 | 1 715 029 |
Running Time | 8hr 7m |
We used all SMRT cells and randomly selected four SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_013061).
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set |
# contigs | 1 | 3 | 3 | 3 |
Largest contig | 5 163 983 | 2 232 679 | 2 236 613 | 2 237 949 |
Total length | 5 163 983 | 5 161 276 | 5 165 518 | 5 166 563 |
N50 | 5 163 983 | 2 043 590 | 2 044 147 | 2 135 225 |
[[Media: SCA_D8.pdf | Misassemblies] | ||||
# misassemblies | 1 | 0 | 0 | 0 |
Misassembled contigs length | 5 163 983 | 0 | 0 | 0 |
Mismatches | ||||
# mismatches per 100kbp | 8.41 | 9.960 | 8.27 | 10.29 |
# indels per 100kbp | 2.19 | 21.34 | 13.29 | 14.78 |
# N's per 100kbp | 0 | 0 | 0 | 0 |
Genome Statistics | ||||
Genome fraction(%) | 99.919 | 99.864 | 99.907 | 99.89 |
Duplication ratio | 1.001 | 1.001 | 1.002 | 1.002 |
# genes | 4335+3 part | 4330+5 part | 4333+5 part | 4333+3 part |
NGA50 | 4 300 532 | 2 043 590 | 2 044 147 | 2 135 225 |
Running Time | 21hr 36m | 11hr 39m | 12hr 26m | 12hr 12m |