Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).
Contents |
---|
We used all SMRT cells and randomly selected four, six and eight SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list. (more detail)
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set | 8 SMRT cells : 1st Set | 8 SMRT cells : 2nd Set | 8 SMRT cells : 3rd Set |
# contigs | 1 | 1 | 1 | 5 | 2 | 2 | 1 | 4 | 1 | 2 |
Largest contig | 4 651 604 | 4 647 117 | 4 648 057 | 3 447 068 | 3 749 516 | 2 770 859 | 4 649 699 | 1 679 082 | 4 649 323 | 4 189 785 |
Total length | 4 651 604 | 4 647 117 | 4 648 057 | 4 661 453 | 4 645 941 | 4 657 272 | 4 649 699 | 4 655 949 | 4 649 323 | 4 652 482 |
N50 | 4 651 604 | 4 647 117 | 4 648 057 | 3 447 068 | 3 749 516 | 2 770 859 | 4 649 699 | 1 159 845 | 4 649 323 | 4 189 785 |
Misassemblies | ||||||||||
# misassemblies | 9 | 7 | 8 | 7 | 7 | 10 | 10 | 6 | 10 | 8 |
Misassembled contigs length | 4 651 604 | 4 647 117 | 4 648 057 | 3 447 068 | 6 645 941 | 4 657 272 | 4 649 699 | 2 143 406 | 4 649 323 | 4 189 785 |
Mismatches | ||||||||||
# mismatches per 100kbp | 0.34 | 1.03 | 0.69 | 0.78 | 0.69 | 0.56 | 0.58 | 0.75 | 0.86 | 0.75 |
# indels per 100kbp | 0.65 | 7.65 | 5.78 | 5.78 | 1.88 | 2.89 | 1.75 | 1.62 | 2 | 2.65 |
# N's per 100kbp | 0 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0.02 | 0 |
Genome Statistics | ||||||||||
Genome fraction(%) | 100 | 100 | 100 | 99.949 | 99.956 | 100 | 100 | 99.959 | 100 | 99.993 |
Duplication ratio | 1.003 | 1.002 | 1.002 | 1.005 | 1.002 | 1.005 | 1.002 | 1.004 | 1.002 | 1.003 |
# genes | 4494+3 part | 4494+3 part | 4494+3 part | 4490+5 part | 4491+2 part | 4495+2 part | 4494+3 part | 4489+6 part | 4494+3 part | 4493+4 part |
NGA50 | 1 207 212 | 1 428 636 | 1 432 247 | 983 650 | 1 552 642 | 873 232 | 2 995 552 | 1 062 313 | 2 956 338 | 1 207 192 |
Running Time | ||||||||||
PacBioToCA | 48hr 16m | 4hr 58m | 5hr 48m | 5hr 10m | 11hr 09m | 9hr 34m | 10hr 47m | 21hr 06m | 22hr 05m | 21hr 23m |
runCA | 15hr 48m | 15hr 22m | 13hr 50m | 11hr 20m | 12hr 38m | 11hr 44m | 13hr 48m | 11hr 37m | 14hr 36m | 13hr 40m |
Total | 64hr 04m | 20hr 20m | 19hr 38m | 16hr 30m | 23hr 47m | 21hr 18m | 24hr 35m | 32hr 43m | 36hr 41m | 25hr 03m |
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 2 | 8 | 10 | 14 | 1 | 1 | 4 |
Largest contig | 4 278 957 | 2 277 010 | 1 213 670 | 984 459 | 4 641 350 | 4 640 250 | 3 162 440 |
Total length | 4 650 771 | 4 648 304 | 4 644 602 | 4 656 274 | 4 641 350 | 4 640 250 | 4 653 394 |
N50 | 4 278 957 | 2 043 590 | 2 044 147 | 2 135 225 | 3 162 440 | 4 640 250 | 4 641 350 |
Misassemblies | |||||||
# misassemblies | 8 | 10 | 8 | 6 | 7 | 7 | 8 |
Misassembled contigs length | 4 278 957 | 2 809 129 | 2 085 482 | 1 947 163 | 4 641 350 | 4 640 250 | 3 209 090 |
Mismatches | |||||||
# mismatches per 100kbp | 0.37 | 2.49 | 1.88 | 5.38 | 0.69 | 0.67 | 0.86 |
# indels per 100kbp | 3.64 | 56.81 | 47.62 | 77.31 | 10.67 | 12.87 | 11 |
# N's per 100kbp | 0 | 0.04 | 0.02 | 0.09 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 99.93 | 99.733 | 99.67 | 99.693 | 99.972 | 99.946 | 99.968 |
Duplication ratio | 1.003 | 1.006 | 1.005 | 1.008 | 1.001 | 1.001 | 1.005 |
# genes | 4492+5 part | 4475+10 part | 4467+12 part | 4469+13 part | 4492+4 part | 4491+4 part | 4492+4 part |
NGA50 | 1 207 233 | 531 351 | 721 189 | 565 251 | 2 499 057 | 2 499 697 | 1 267 262 |
Running Time | |||||||
pacBioToCA | 20hr 03m | 5hr 52m | 6hr 05m | 5hr 19m | 15hr 53m | 14hr 47m | 15hr 38m |
runCA | 15hr 41m | 7hr 32m | 7hr 10m | 5hr 42m | 15hr 44m | 16hr 02m | 13hr 27m |
Total | 35hr 44m | 13hr 24m | 13hr 15m | 11hr 01m | 31hr 37m | 30hr 49m | 29hr 05m |
Misassemblies for Adobe reader.
We used all SMRT cells to do assembly and evaluated the assemblies by QUAST against the reference genome (NC_013946) and Mr_gene_list.
Statistics without reference | All Data |
# contigs | 2 |
Largest contig | 2 974 307 |
Total length | 3 100 289 |
N50 | 2 974 307 |
Misassemblies | |
# misassemblies | 3 |
Misassembled contigs length | 2 974 307 |
Mismatches | |
# mismatches per 100kbp | 0.23 |
# indels per 100kbp | 5.04 |
# N's per 100kbp | 0.03 |
Genome Statistics | |
Genome fraction(%) | 99.883 |
Duplication ratio | 1.002 |
# genes | 3093+4 part |
NGA50 | 1 715 029 |
Running Time | |
pacBioToCA | 7hr 35m |
runCA | 8hr 7m |
Total | 15hr 42m |
We used all SMRT cells and randomly selected four SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_013061) and Ph_gene_list
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set |
# contigs | 1 | 3 | 3 | 3 |
Largest contig | 5 163 983 | 2 232 679 | 2 236 613 | 2 237 949 |
Total length | 5 163 983 | 5 161 276 | 5 165 518 | 5 166 563 |
N50 | 5 163 983 | 2 043 590 | 2 044 147 | 2 135 225 |
Misassemblies | ||||
# misassemblies | 1 | 0 | 0 | 0 |
Misassembled contigs length | 5 163 983 | 0 | 0 | 0 |
Mismatches | ||||
# mismatches per 100kbp | 8.41 | 9.960 | 8.27 | 10.29 |
# indels per 100kbp | 2.19 | 21.34 | 13.29 | 14.78 |
# N's per 100kbp | 0 | 0 | 0 | 0 |
Genome Statistics | ||||
Genome fraction(%) | 99.919 | 99.864 | 99.907 | 99.89 |
Duplication ratio | 1.001 | 1.001 | 1.002 | 1.002 |
# genes | 4335+3 part | 4330+5 part | 4333+5 part | 4333+3 part |
NGA50 | 4 300 532 | 2 043 590 | 2 044 147 | 2 135 225 |
Running Time | ||||
pacBioToCA | 18hr 55m | 6hr 27m | 6hr 34m | 6hr 31m |
runCA | 21hr 36m | 11hr 39m | 12hr 26m | 12hr 12m |
Total | 40hr 31m | 18hr 06m | 19hr 00n | 18hr 43m |
Misassemblies for Adobe reader.
We used all SMRT cells and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list.
Statistics without reference | All Data |
# contigs | 1 |
Largest contig | 4 656257 |
Total length | 4 656 257 |
N50 | 4 656 257 |
Misassemblies | |
# misassemblies | 8 |
Misassembled contigs length | 4 656 257 |
Mismatches | |
# mismatches per 100kbp | 0.22 |
# indels per 100kbp | 13.23 |
# N's per 100kbp | 0 |
Genome Statistics | |
Genome fraction(%) | 100 |
Duplication ratio | 1.004 |
# genes | 4494+3 part |
NGA50 | 2 995 284 |
Genome Statistics | |
PacBioToCA | 13hr 01m |
runCA | 17hr 58m |
Total | 30hr 59m |