Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).
Contents |
---|
We randomly selected four, six and eight SMRT cells three times for each, and access the correctness by Quast.
Statistics without reference | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set | 8 SMRT cells : 1st Set | 8 SMRT cells : 2nd Set | 8 SMRT cells : 3rd Set |
# contigs | 5 | 10 | 4 | 11 | 7 | 8 | 6 | 10 | 5 |
Largest contig | 3 770 578 | 4 106 852 | 4 644 754 | 3 785 116 | 4 647 724 | 3 287 965 | 4 649 322 | 4 623 068 | 4 649 308 |
Total length | 4 684 069 | 4 723 363 | 4 671 153 | 4 736 342 | 4 711 060 | 4 708 831 | 4 706 433 | 4 731 334 | 4 691 736 |
N50 | 3 770 578 | 4 106 852 | 4 644 754 | 3 785 116 | 4 647 724 | 3 287 965 | 4 649 322 | 4 623 068 | 4 649 308 |
Misassemblies | |||||||||
# misassemblies | 10 | 13 | 13 | 15 | 12 | 11 | 11 | 16 | 12 |
Misassembled contigs length | 3 788 648 | 4 700 016 | 4 671 153 | 4 726 005 | 4 685 712 | 3 339 030 | 4 694 303 | 4 698 068 | 4 649 308 |
Mismatches | |||||||||
# mismatches per 100kbp | 0.47 | 0.56 | 0.37 | 0.19 | 0.11 | 0.15 | 0.13 | 0.43 | 0.17 |
# indels per 100kbp | 1.08 | 4.44 | 0.22 | 1.66 | 0.63 | 0.65 | 0.19 | 4.59 | 0.56 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||||
Genome fraction(%) | 100 | 100 | 99.994 | 99.999 | 100 | 100 | 100 | 99.99 | 100 |
Duplication ratio | 1.01 | 1.018 | 1.007 | 1.021 | 1.031 | 1.015 | 1.012 | 1.02 | 1.011 |
# genes | 4495+2 part | 4495+2 part | 4493+3 part | 4494+3 part | 4495+2 part | 4495+2 part | 4495+2 part | 4494+3 part | 4495+2 part |
NGA50 | 1 207 217 | 2 558 505 | 1 640 882 | 2 888 022 | 2 834 458 | 1 298 912 | 1 477 605 | 1 344 200 | 2 995 586 |
Running Time | ?hr ?m | ?hr ?m | ?hr ?m | 21hr 05m | 19hr 32m | 21hr 01m | 26hr 46m | |27hr 52m | 26hr 13m |
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 2 | 8 | 10 | 14 | 1 | 1 | 4 |
Largest contig | 4 278 957 | 2 277 010 | 1 213 670 | 984 459 | 4 641 350 | 4 640 250 | 3 162 440 |
Total length | 4 650 771 | 4 648 304 | 4 644 602 | 4 656 274 | 4 641 350 | 4 640 250 | 4 653 394 |
N50 | 4 278 957 | 2 043 590 | 2 044 147 | 2 135 225 | 3 162 440 | 4 640 250 | 4 641 350 |
Misassemblies | |||||||
# misassemblies | 8 | 10 | 8 | 6 | 7 | 7 | 8 |
Misassembled contigs length | 4 278 957 | 2 809 129 | 2 085 482 | 1 947 163 | 4 641 350 | 4 640 250 | 3 209 090 |
Mismatches | |||||||
# mismatches per 100kbp | 0.37 | 2.49 | 1.88 | 5.38 | 0.69 | 0.67 | 0.86 |
# indels per 100kbp | 3.64 | 56.81 | 47.62 | 77.31 | 10.67 | 12.87 | 11 |
# N's per 100kbp | 0 | 0.04 | 0.02 | 0.09 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 99.93 | 99.733 | 99.67 | 99.693 | 99.972 | 99.946 | 99.968 |
Duplication ratio | 1.003 | 1.006 | 1.005 | 1.008 | 1.001 | 1.001 | 1.005 |
# genes | 4492+5 part | 4475+10 part | 4467+12 part | 4469+13 part | 4492+4 part | 4491+4 part | 4492+4 part |
NGA50 | 1 207 233 | 531 351 | 721 189 | 565 251 | 2 499 057 | 2 499 697 | 1 267 262 |
Running Time | 15hr 41m | 7hr 32m | 7hr 10m | 5hr 42m | 15hr 44m | 16hr 02m | 13hr 27m |
We used all SMRT cells to do assembly and evaluated the assemblies by QUAST against the reference genome (NC_013946).
Statistics without reference | All Data |
# contigs | 2 |
Largest contig | 2 974 307 |
Total length | 3 100 289 |
N50 | 2 974 307 |
Misassemblies | |
# misassemblies | 3 |
Misassembled contigs length | 2 974 307 |
Mismatches | |
# mismatches per 100kbp | 0.23 |
# indels per 100kbp | 5.04 |
# N's per 100kbp | 0.03 |
Genome Statistics | |
Genome fraction(%) | 99.883 |
Duplication ratio | 1.002 |
# genes | 3093+4 part |
NGA50 | 1 715 029 |
Running Time | 8hr 7m |
We used all SMRT cells and randomly selected four SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_013061).
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set |
# contigs | 1 | 3 | 3 | 3 |
Largest contig | 5 163 983 | 2 232 679 | 2 236 613 | 2 237 949 |
Total length | 5 163 983 | 5 161 276 | 5 165 518 | 5 166 563 |
N50 | 5 163 983 | 2 043 590 | 2 044 147 | 2 135 225 |
Misassemblies | ||||
# misassemblies | 1 | 0 | 0 | 0 |
Misassembled contigs length | 5 163 983 | 0 | 0 | 0 |
Mismatches | ||||
# mismatches per 100kbp | 8.41 | 9.960 | 8.27 | 10.29 |
# indels per 100kbp | 2.19 | 21.34 | 13.29 | 14.78 |
# N's per 100kbp | 0 | 0 | 0 | 0 |
Genome Statistics | ||||
Genome fraction(%) | 99.919 | 99.864 | 99.907 | 99.89 |
Duplication ratio | 1.001 | 1.001 | 1.002 | 1.002 |
# genes | 4335+3 part | 4330+5 part | 4333+5 part | 4333+3 part |
NGA50 | 4 300 532 | 2 043 590 | 2 044 147 | 2 135 225 |
Running Time | 21hr 36m | 11hr 39m | 12hr 26m | 12hr 12m |