SCA

Revision as of 20 January 2014 21:10 by admin (Comments | Contribs) | (→Performance)

(diff) ← Previous revision | Current revision | Next revision → (diff)

Self-correction approach (SCA) was proposed in the ref (Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013).

Contents [hide]
1 DataSet1 1.1 Performance 2 DataSet2 2.1 Performance 3 DataSet3 3.1 Performance

DataSet1

We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast.

Performance

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set	6 SMRT cells : 1st Set	6 SMRT cells : 2nd Set	6 SMRT cells : 3rd Set
# contigs	2	8	10	14	1	1	4
Largest contig	4 278 957	2 277 010	1 213 670	984 459	4 641 350	4 640 250	3 162 440
Total length	4 650 771	4 648 304	4 644 602	4 656 274	4 641 350	4 640 250	4 653 394
N50	4 278 957	2 043 590	2 044 147	2 135 225	3 162 440	4 640 250	4 641 350
Misassemblies
# misassemblies	8	10	8	6	7	7	8
Misassembled contigs length	4 278 957	2 809 129	2 085 482	1 947 163	4 641 350	4 640 250	3 209 090
Mismatches
# mismatches per 100kbp	0.37	2.49	1.88	5.38	0.69	0.67	0.86
# indels per 100kbp	3.64	56.81	47.62	77.31	10.67	12.87	11
# N's per 100kbp	0	0.04	0.02	0.09	0	0	0
Genome Statistics
Genome fraction(%)	99.93	99.733	99.67	99.693	99.972	99.946	99.968
Duplication ratio	1.003	1.006	1.005	1.008	1.001	1.001	1.005
# genes	4492+5 part	4475+10 part	4467+12 part	4469+13 part	4492+4 part	4491+4 part	4492+4 part
NGA50	1 207 233	531 351	721 189	565 251	2 499 057	2 499 697	1 267 262
Running Time	15hr 41m	7hr 32m	7hr 10m	5hr 42m	15hr 44m	16hr 02m	13hr 27m

DataSet2

We used all SMRT cells to do assembly and access the correctness by Quast.

Performance

Statistics without reference	All Data
# contigs	2
Largest contig	2 974 307
Total length	3 100 289
N50	2 974 307
Misassemblies
# misassemblies	3
Misassembled contigs length	2 974 307
Mismatches
# mismatches per 100kbp	0.23
# indels per 100kbp	5.04
# N's per 100kbp	0.03
Genome Statistics
Genome fraction(%)	99.883
Duplication ratio	1.002
# genes	3093+4 part
NGA50	1 715 029
Running Time	8hr 7m

DataSet3

Performance

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set
# contigs	1	3	3	3
Largest contig	5 163 983	2 232 679	2 236 613	2 237 949
Total length	5 163 983	5 161 276	5 165 518	5 166 563
N50	5 163 983	2 043 590	2 044 147\|2 135 225
Misassemblies
# misassemblies	1	0	0	0
Misassembled contigs length	5 163 983	0	0	0
Mismatches
# mismatches per 100kbp	8.41	9.960	8.27	10.29
# indels per 100kbp	2.19	21.34	13.29	14.78
# N's per 100kbp	0	0	0	0
Genome Statistics
Genome fraction(%)	99.919	99.864	99.907	99.89
Duplication ratio	1.001	1.001	1.002	1.002
# genes	4335+3 part	4330+5 part	4333+5 part	4333+3 part
NGA50	4 300 532	2 043 590	2 044 147	2 135 225
Running Time	21hr 36m	11hr 39m	12hr 26m	12hr 12m