Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).
Contents |
---|
We randomly selected four, six and eight SMRT cells three times for each, and access the correctness by Quast.
Statistics without reference | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set | 8 SMRT cells : 1st Set | 8 SMRT cells : 2nd Set | 8 SMRT cells : 3rd Set | 17 SMRT cells |
# contigs | 1 | 1 | 5 | 2 | 2 | 1 | 4 | 1 | 2 | 1 |
Largest contig | 4 647 117 | 4 648 057 | 3 447 068 | 3 749 516 | 2 770 859 | 4 649 699 | 1 679 082 | 4 649 323 | 4 189 785 | 4 651 604 |
Total length | 4 647 117 | 4 648 057 | 4 661 453 | 4 645 941 | 4 657 272 | 4 649 699 | 4 655 949 | 4 649 323 | 4 652 482 | 4 651 604 |
N50 | 4 647 117 | 4 648 057 | 3 447 068 | 3 749 516 | 2 770 859 | 4 649 699 | 1 159 845 | 4 649 323 | 4 189 785 | 4 651 604 |
Misassemblies | ||||||||||
# misassemblies | 7 | 8 | 7 | 7 | 10 | 10 | 6 | 10 | 8 | 9 |
Misassembled contigs length | 4 647 117 | 4 648 057 | 3 447 068 | 6 645 941 | 4 657 272 | 4 649 699 | 2 143 406 | 4 649 323 | 4 189 785 | 4 651 604 |
Mismatches | ||||||||||
# mismatches per 100kbp | 1.03 | 0.69 | 0.78 | 0.69 | 0.56 | 0.58 | 0.75 | 0.86 | 0.75 | 0.34 |
# indels per 100kbp | 7.65 | 5.78 | 5.78 | 1.88 | 2.89 | 1.75 | 1.62 | 2 | 2.65 | 0.65 |
# N's per 100kbp | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0.02 | 0 | 0 |
Genome Statistics | ||||||||||
Genome fraction(%) | 100 | 100 | 99.949 | 99.956 | 100 | 100 | 99.959 | 100 | 99.993 | 100 |
Duplication ratio | 1.002 | 1.002 | 1.005 | 1.002 | 1.005 | 1.002 | 1.004 | 1.002 | 1.003 | 1.003 |
# genes | 4495+2 part | 4495+2 part | 4493+3 part | 4494+3 part | 4495+2 part | 4495+2 part | 4495+2 part | 4494+3 part | 4495+2 part | |
NGA50 | 1 207 217 | 2 558 505 | 1 640 882 | 2 888 022 | 2 834 458 | 1 298 912 | 1 477 605 | 1 344 200 | 2 995 586 | |
Running Time | ?hr ?m | ?hr ?m | ?hr ?m | 21hr 05m | 19hr 32m | 21hr 01m | 26hr 46m | |27hr 52m | 26hr 13m |
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 2 | 8 | 10 | 14 | 1 | 1 | 4 |
Largest contig | 4 278 957 | 2 277 010 | 1 213 670 | 984 459 | 4 641 350 | 4 640 250 | 3 162 440 |
Total length | 4 650 771 | 4 648 304 | 4 644 602 | 4 656 274 | 4 641 350 | 4 640 250 | 4 653 394 |
N50 | 4 278 957 | 2 043 590 | 2 044 147 | 2 135 225 | 3 162 440 | 4 640 250 | 4 641 350 |
Misassemblies | |||||||
# misassemblies | 8 | 10 | 8 | 6 | 7 | 7 | 8 |
Misassembled contigs length | 4 278 957 | 2 809 129 | 2 085 482 | 1 947 163 | 4 641 350 | 4 640 250 | 3 209 090 |
Mismatches | |||||||
# mismatches per 100kbp | 0.37 | 2.49 | 1.88 | 5.38 | 0.69 | 0.67 | 0.86 |
# indels per 100kbp | 3.64 | 56.81 | 47.62 | 77.31 | 10.67 | 12.87 | 11 |
# N's per 100kbp | 0 | 0.04 | 0.02 | 0.09 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 99.93 | 99.733 | 99.67 | 99.693 | 99.972 | 99.946 | 99.968 |
Duplication ratio | 1.003 | 1.006 | 1.005 | 1.008 | 1.001 | 1.001 | 1.005 |
# genes | 4492+5 part | 4475+10 part | 4467+12 part | 4469+13 part | 4492+4 part | 4491+4 part | 4492+4 part |
NGA50 | 1 207 233 | 531 351 | 721 189 | 565 251 | 2 499 057 | 2 499 697 | 1 267 262 |
Running Time | 15hr 41m | 7hr 32m | 7hr 10m | 5hr 42m | 15hr 44m | 16hr 02m | 13hr 27m |
We used all SMRT cells to do assembly and evaluated the assemblies by QUAST against the reference genome (NC_013946).
Statistics without reference | All Data |
# contigs | 2 |
Largest contig | 2 974 307 |
Total length | 3 100 289 |
N50 | 2 974 307 |
Misassemblies | |
# misassemblies | 3 |
Misassembled contigs length | 2 974 307 |
Mismatches | |
# mismatches per 100kbp | 0.23 |
# indels per 100kbp | 5.04 |
# N's per 100kbp | 0.03 |
Genome Statistics | |
Genome fraction(%) | 99.883 |
Duplication ratio | 1.002 |
# genes | 3093+4 part |
NGA50 | 1 715 029 |
Running Time | 8hr 7m |
We used all SMRT cells and randomly selected four SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_013061).
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set |
# contigs | 1 | 3 | 3 | 3 |
Largest contig | 5 163 983 | 2 232 679 | 2 236 613 | 2 237 949 |
Total length | 5 163 983 | 5 161 276 | 5 165 518 | 5 166 563 |
N50 | 5 163 983 | 2 043 590 | 2 044 147 | 2 135 225 |
Misassemblies | ||||
# misassemblies | 1 | 0 | 0 | 0 |
Misassembled contigs length | 5 163 983 | 0 | 0 | 0 |
Mismatches | ||||
# mismatches per 100kbp | 8.41 | 9.960 | 8.27 | 10.29 |
# indels per 100kbp | 2.19 | 21.34 | 13.29 | 14.78 |
# N's per 100kbp | 0 | 0 | 0 | 0 |
Genome Statistics | ||||
Genome fraction(%) | 99.919 | 99.864 | 99.907 | 99.89 |
Duplication ratio | 1.001 | 1.001 | 1.002 | 1.002 |
# genes | 4335+3 part | 4330+5 part | 4333+5 part | 4333+3 part |
NGA50 | 4 300 532 | 2 043 590 | 2 044 147 | 2 135 225 |
Running Time | 21hr 36m | 11hr 39m | 12hr 26m | 12hr 12m |