Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).
DataSet1
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast.
Performance
Statistics without reference |
All Data |
4 SMRT cells : 1st Set |
4 SMRT cells : 2nd Set |
4 SMRT cells : 3rd Set |
6 SMRT cells : 1st Set |
6 SMRT cells : 2nd Set |
6 SMRT cells : 3rd Set |
# contigs |
2 |
8 |
10 |
14 |
1 |
1 |
4 |
Largest contig |
4 278 957 |
2 277 010 |
1 213 670 |
984 459 |
4 641 350 |
4 640 250 |
3 162 440 |
Total length |
4 650 771 |
4 648 304 |
4 644 602 |
4 656 274 |
4 641 350 |
4 640 250 |
4 653 394 |
N50 |
4 278 957 |
2 043 590 |
2 044 147 |
2 135 225 |
3 162 440 |
4 640 250 |
4 641 350 |
Misassemblies |
|
|
|
|
|
|
|
# misassemblies |
8 |
10 |
8 |
6 |
7 |
7 |
8 |
Misassembled contigs length |
4 278 957 |
2 809 129 |
2 085 482 |
1 947 163 |
4 641 350 |
4 640 250 |
3 209 090 |
Mismatches |
|
|
|
|
|
|
|
# mismatches per 100kbp |
0.37 |
2.49 |
1.88 |
5.38 |
0.69 |
0.67 |
0.86 |
# indels per 100kbp |
3.64 |
56.81 |
47.62 |
77.31 |
10.67 |
12.87 |
11 |
# N's per 100kbp |
0 |
0.04 |
0.02 |
0.09 |
0 |
0 |
0 |
Genome Statistics |
|
|
|
|
|
|
|
Genome fraction(%) |
99.93 |
99.733 |
99.67 |
99.693 |
99.972 |
99.946 |
99.968 |
Duplication ratio |
1.003 |
1.006 |
1.005 |
1.008 |
1.001 |
1.001 |
1.005 |
# genes |
4492+5 part |
4475+10 part |
4467+12 part |
4469+13 part |
4492+4 part |
4491+4 part |
4492+4 part |
NGA50 |
1 207 233 |
531 351 |
721 189 |
565 251 |
2 499 057 |
2 499 697 |
1 267 262 |
Running Time |
15hr 41m |
7hr 32m |
7hr 10m |
5hr 42m |
15hr 44m |
16hr 02m |
13hr 27m |
DataSet2
We used all SMRT cells to do assembly and access the correctness by Quast.
Performance
Statistics without reference |
All Data |
# contigs |
2 |
Largest contig |
2 974 307 |
Total length |
3 100 289 |
N50 |
2 974 307 |
Misassemblies |
|
# misassemblies |
3 |
Misassembled contigs length |
2 974 307 |
Mismatches |
|
# mismatches per 100kbp |
0.23 |
# indels per 100kbp |
5.04 |
# N's per 100kbp |
0.03 |
Genome Statistics |
|
Genome fraction(%) |
99.883 |
Duplication ratio |
1.002 |
# genes |
3093+4 part |
NGA50 |
1 715 029 |
Running Time |
?hr ?m |
DataSet3
Performance
Statistics without reference |
All Data |
4 SMRT cells : 1st Set |
4 SMRT cells : 2nd Set |
4 SMRT cells : 3rd Set |
# contigs |
3 |
3 |
3 |
6 |
Largest contig |
2 934 267 |
2 927 454 |
2 929 942 |
2 226 051 |
Total length |
5 178 932 |
5 176 592 |
5 176 771 |
5 182 410 |
N50 |
2 934 267 |
2 927 454 |
2 929 942|2 133 457 |
Misassemblies |
|
|
|
|
# misassemblies |
0 |
1 |
0 |
1 |
Misassembled contigs length |
0 |
2 240 169 |
0 |
13 124 |
Mismatches |
|
|
|
|
# mismatches per 100kbp |
0 |
0.02 |
0.06 |
6.45 |
# indels per 100kbp |
1.05 |
0.54 |
0.6 |
1.88 |
# N's per 100kbp |
0 |
0 |
0 |
0 |
Genome Statistics |
|
|
|
|
Genome fraction(%) |
100 |
100 |
100 |
99.936 |
Duplication ratio |
1.003 |
1.003 |
1.003 |
1.006 |
# genes |
4338+1 part |
4338+1 part |
4338+1 part |
4335+4 part |
NGA50 |
2 934 267 |
2 927 454 |
2 929 942 |
2 133 457 |
Running Time |
24hr 56m |
17hr 41m |
18hr 14m |
17hr 04m |