SSPACE

Revision as of 15 August 2014 01:33 by admin (Comments | Contribs)

We did assembly by SPAdes with Dataset 4 raw data and used different subreads depths of Dataset 5 and Dataset 9 to scaffold by SSPACE-longread.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

Evaluation

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list).

Statistics without reference Miseq_only Miseq_1cell Miseq_2cell Miseq_3cell Miseq_4cell Miseq_17cell Miseq_d9
# contigs 80 93 83 61 69 68 66
Largest contig 745120 664876 562203 663399 434084 345313 437164
Total length 4975695 5031560 5043217 4804004 4805579 4801310 4733683
N50 356974 221472 324225 295449 179662 207976 186993
Misassemblies
# misassemblies 11 17 21 10 13 20 15
Misassembled contigs length 1552524 976207 2108892 1222277 782726 1156917 527873
Mismatches
# mismatches per 100 kbp 3.32 2.91 3.06 7.08 6.4 10.33 4.13
# indels per 100 kbp 2.98 1.38 1.01 13.15 5.2 5.54 2.69
# N's per 100 kbp 0.38 0.12 0.22 0.4 0.37 0.4 0.23
Genome statistics
Genome fraction (%) 99.97 100 100 99.304 99.424 99.522 98.712
Duplication ratio 1.074 1.086 1.090 1.043 1.047 1.04 1.033
# genes 4489 + 7 part 4490 + 7 part 4495 + 2 part 4461 + 25 part 4451 + 31 part 4459 + 28 part 4412 + 32 part
NGA50 357183 221098 279423 226118 179662 194634 191457

We discarded the contigs which fewer than 100 reads aligned. more detail

Statistics without reference 071634_raw_asm.ctg 192221_raw_asm.ctg 210845_raw_asm.ctg 071634_100X_asm.ctg 071634_118X_asm.ctg 192221_118X_asm.ctg 210845_118X_asm.ctg
# contigs 19 24 21 28 38 29 31
Largest contig 745120 664876 592203 663399 434084 345313 437164
Total length 4669108 4675696 4700617 4636263 4644391 4603072 4578972
N50 356974 222559 399011 295449 180706 207976 191458
Misassemblies
# misassemblies 7 6 11 6 5 7 6
Misassembled contigs length 1539749 936587 2058922 1200212 727024 1097466 478971
Mismatches
# mismatches per 100 kbp 2.75 2.75 3.04 7.08 5.85 8.82 3.69
# indels per 100 kbp 2.23 1.1 1.17 13.46 5.83 2.49 2.37
# N's per 100 kbp 0.19 0.02 0.04 0.26 0.15 0.07 0.02
Genome statistics
Genome fraction (%) 99.639 99.699 99.834 99.984 99.051 98.78 98.159
Duplication ratio 1.011 1.011 1.017 1.01 1.015 1.005 1.006
# genes 4473 + 15 part 4465 + 18 part 4480 + 10 part 4435 + 29 part 4431 + 36 part 4413 + 36 part 4380 + 34 part
NGA50 357183 221098 279423 226118 179662 194634 191457