Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).
Dataset 6 (E.coli, 8 SMRT cells)
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast.
Performance
Statistics without reference |
All Data |
4 SMRT cells : 1st Set |
4 SMRT cells : 2nd Set |
4 SMRT cells : 3rd Set |
6 SMRT cells : 1st Set |
6 SMRT cells : 2nd Set |
6 SMRT cells : 3rd Set |
# contigs |
2 |
8 |
10 |
14 |
1 |
1 |
4 |
Largest contig |
4 278 957 |
2 277 010 |
1 213 670 |
984 459 |
4 641 350 |
4 640 250 |
3 162 440 |
Total length |
4 650 771 |
4 648 304 |
4 644 602 |
4 656 274 |
4 641 350 |
4 640 250 |
4 653 394 |
N50 |
4 278 957 |
2 043 590 |
2 044 147 |
2 135 225 |
3 162 440 |
4 640 250 |
4 641 350 |
Misassemblies |
|
|
|
|
|
|
|
# misassemblies |
8 |
10 |
8 |
6 |
7 |
7 |
8 |
Misassembled contigs length |
4 278 957 |
2 809 129 |
2 085 482 |
1 947 163 |
4 641 350 |
4 640 250 |
3 209 090 |
Mismatches |
|
|
|
|
|
|
|
# mismatches per 100kbp |
0.37 |
2.49 |
1.88 |
5.38 |
0.69 |
0.67 |
0.86 |
# indels per 100kbp |
3.64 |
56.81 |
47.62 |
77.31 |
10.67 |
12.87 |
11 |
# N's per 100kbp |
0 |
0.04 |
0.02 |
0.09 |
0 |
0 |
0 |
Genome Statistics |
|
|
|
|
|
|
|
Genome fraction(%) |
99.93 |
99.733 |
99.67 |
99.693 |
99.972 |
99.946 |
99.968 |
Duplication ratio |
1.003 |
1.006 |
1.005 |
1.008 |
1.001 |
1.001 |
1.005 |
# genes |
4492+5 part |
4475+10 part |
4467+12 part |
4469+13 part |
4492+4 part |
4491+4 part |
4492+4 part |
NGA50 |
1 207 233 |
531 351 |
721 189 |
565 251 |
2 499 057 |
2 499 697 |
1 267 262 |
Running Time |
15hr 41m |
7hr 32m |
7hr 10m |
5hr 42m |
15hr 44m |
16hr 02m |
13hr 27m |
DataSet2
We used all SMRT cells to do assembly and access the correctness by Quast.
Performance
Statistics without reference |
All Data |
# contigs |
2 |
Largest contig |
2 974 307 |
Total length |
3 100 289 |
N50 |
2 974 307 |
Misassemblies |
|
# misassemblies |
3 |
Misassembled contigs length |
2 974 307 |
Mismatches |
|
# mismatches per 100kbp |
0.23 |
# indels per 100kbp |
5.04 |
# N's per 100kbp |
0.03 |
Genome Statistics |
|
Genome fraction(%) |
99.883 |
Duplication ratio |
1.002 |
# genes |
3093+4 part |
NGA50 |
1 715 029 |
Running Time |
8hr 7m |
DataSet3
Performance
Statistics without reference |
All Data |
4 SMRT cells : 1st Set |
4 SMRT cells : 2nd Set |
4 SMRT cells : 3rd Set |
# contigs |
1 |
3 |
3 |
3 |
Largest contig |
5 163 983 |
2 232 679 |
2 236 613 |
2 237 949 |
Total length |
5 163 983 |
5 161 276 |
5 165 518 |
5 166 563 |
N50 |
5 163 983 |
2 043 590 |
2 044 147 |
2 135 225 |
Misassemblies |
|
|
|
|
# misassemblies |
1 |
0 |
0 |
0 |
Misassembled contigs length |
5 163 983 |
0 |
0 |
0 |
Mismatches |
|
|
|
|
# mismatches per 100kbp |
8.41 |
9.960 |
8.27 |
10.29 |
# indels per 100kbp |
2.19 |
21.34 |
13.29 |
14.78 |
# N's per 100kbp |
0 |
0 |
0 |
0 |
Genome Statistics |
|
|
|
|
Genome fraction(%) |
99.919 |
99.864 |
99.907 |
99.89 |
Duplication ratio |
1.001 |
1.001 |
1.002 |
1.002 |
# genes |
4335+3 part |
4330+5 part |
4333+5 part |
4333+3 part |
NGA50 |
4 300 532 |
2 043 590 |
2 044 147 |
2 135 225 |
Running Time |
21hr 36m |
11hr 39m |
12hr 26m |
12hr 12m |