SSPACE

Revision as of 15 August 2014 01:46 by admin (Comments | Contribs)

We did assembly by SPAdes with Dataset 4 raw data and used different subreads depths of Dataset 5 and Dataset 9 to scaffold by SSPACE-longread.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

Evaluation

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list).

Statistics without reference Miseq_only Miseq_1cell Miseq_2cell Miseq_3cell Miseq_4cell Miseq_17cell Miseq_d9
# contigs 86 15 18 16 15 17 14
Largest contig 285889 2497845 1260980 2501081 3194637 1954649 3392211
Total length 4577132 4632009 4633058 4636174 4638657 4633857 4632677
N50 139882 2497845 1238868 2501081 3194637 1238635 3392211
Misassemblies
# misassemblies 2 9 10 10 10 8 8
Misassembled contigs length 215581 3193893 3244631 2705788 3657574 3243566 4050696
Mismatches
# mismatches per 100 kbp 3.02 7.15 6.2 6.9 7.17 6.05 6.43
# indels per 100 kbp 0.46 1.06 1 1.32 1.27 0.95 1.06
# N's per 100 kbp 0 97.89 67.67 77.33 77.37 91.03 123.88
Genome statistics
Genome fraction (%) 98.451 99.498 99.483 99.664 99.748 99.432 99.587
Duplication ratio 1.001 1.002 1.002 1.002 1.002 1.003 1.001
# genes 4399 + 32 part 4467 + 14 part 4465 + 13 part 4476 + 11 part 4477 + 11 part 4467 + 11 part 4470 + 13 part
NGA50 133059 571664 425173 852639 1039467 1039472 1039654

We discarded the contigs which fewer than 100 reads aligned. more detail

Statistics without reference 071634_raw_asm.ctg 192221_raw_asm.ctg 210845_raw_asm.ctg 071634_100X_asm.ctg 071634_118X_asm.ctg 192221_118X_asm.ctg 210845_118X_asm.ctg
# contigs 19 24 21 28 38 29 31
Largest contig 745120 664876 592203 663399 434084 345313 437164
Total length 4669108 4675696 4700617 4636263 4644391 4603072 4578972
N50 356974 222559 399011 295449 180706 207976 191458
Misassemblies
# misassemblies 7 6 11 6 5 7 6
Misassembled contigs length 1539749 936587 2058922 1200212 727024 1097466 478971
Mismatches
# mismatches per 100 kbp 2.75 2.75 3.04 7.08 5.85 8.82 3.69
# indels per 100 kbp 2.23 1.1 1.17 13.46 5.83 2.49 2.37
# N's per 100 kbp 0.19 0.02 0.04 0.26 0.15 0.07 0.02
Genome statistics
Genome fraction (%) 99.639 99.699 99.834 99.984 99.051 98.78 98.159
Duplication ratio 1.011 1.011 1.017 1.01 1.015 1.005 1.006
# genes 4473 + 15 part 4465 + 18 part 4480 + 10 part 4435 + 29 part 4431 + 36 part 4413 + 36 part 4380 + 34 part
NGA50 357183 221098 279423 226118 179662 194634 191457