SCA

Revision as of 13 March 2014 03:23 by admin (Comments | Contribs) | (Performance)

Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).

Contents

Dataset 6 (E.coli K-12 MG1655, 8 SMRT cells)

We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913).

Performance

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set
# contigs 2 8 10 14 1 1 4
Largest contig 4 278 957 2 277 010 1 213 670 984 459 4 641 350 4 640 250 3 162 440
Total length 4 650 771 4 648 304 4 644 602 4 656 274 4 641 350 4 640 250 4 653 394
N50 4 278 957 2 043 590 2 044 147 2 135 225 3 162 440 4 640 250 4 641 350
Misassemblies
# misassemblies 8 10 8 6 7 7 8
Misassembled contigs length 4 278 957 2 809 129 2 085 482 1 947 163 4 641 350 4 640 250 3 209 090
Mismatches
# mismatches per 100kbp 0.37 2.49 1.88 5.38 0.69 0.67 0.86
# indels per 100kbp 3.64 56.81 47.62 77.31 10.67 12.87 11
# N's per 100kbp 0 0.04 0.02 0.09 0 0 0
Genome Statistics
Genome fraction(%) 99.93 99.733 99.67 99.693 99.972 99.946 99.968
Duplication ratio 1.003 1.006 1.005 1.008 1.001 1.001 1.005
# genes 4492+5 part 4475+10 part 4467+12 part 4469+13 part 4492+4 part 4491+4 part 4492+4 part
NGA50 1 207 233 531 351 721 189 565 251 2 499 057 2 499 697 1 267 262
Running Time 15hr 41m 7hr 32m 7hr 10m 5hr 42m 15hr 44m 16hr 02m 13hr 27m

Dataset 7, (M. ruber DSM1279, 4 SMRT cells)

We used all SMRT cells to do assembly and and evaluated the assemblies by QUAST against the reference genome (NC_013946).

Performance

Statistics without reference All Data
# contigs 2
Largest contig 2 974 307
Total length 3 100 289
N50 2 974 307
Misassemblies
# misassemblies 3
Misassembled contigs length 2 974 307
Mismatches
# mismatches per 100kbp 0.23
# indels per 100kbp 5.04
# N's per 100kbp 0.03
Genome Statistics
Genome fraction(%) 99.883
Duplication ratio 1.002
# genes 3093+4 part
NGA50 1 715 029
Running Time 8hr 7m

Dataset 8 (P. heparinus DSM1279, 7 SMRT cells)

We used all SMRT cells and randomly selected four SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_013061).

Performance

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set
# contigs 1 3 3 3
Largest contig 5 163 983 2 232 679 2 236 613 2 237 949
Total length 5 163 983 5 161 276 5 165 518 5 166 563
N50 5 163 983 2 043 590 2 044 147 2 135 225
[[Media: SCA_D8.pdf | Misassemblies]
# misassemblies 1 0 0 0
Misassembled contigs length 5 163 983 0 0 0
Mismatches
# mismatches per 100kbp 8.41 9.960 8.27 10.29
# indels per 100kbp 2.19 21.34 13.29 14.78
# N's per 100kbp 0 0 0 0
Genome Statistics
Genome fraction(%) 99.919 99.864 99.907 99.89
Duplication ratio 1.001 1.001 1.002 1.002
# genes 4335+3 part 4330+5 part 4333+5 part 4333+3 part
NGA50 4 300 532 2 043 590 2 044 147 2 135 225
Running Time 21hr 36m 11hr 39m 12hr 26m 12hr 12m