SSPACE

Revision as of 14 August 2014 03:28 by admin (Comments | Contribs) n

Evaluation

We have evaluated the assemblies with QUAST 2.2(reference genome NC_000913 and Ec_gene_list).

Single SMRT cell reads were corrected with raw, 100X and 118X short reads.

Statistics without reference 071634_raw_asm.ctg 192221_raw_asm.ctg 210845_raw_asm.ctg 071634_100X_asm.ctg 071634_118X_asm.ctg 192221_118X_asm.ctg 210845_118X_asm.ctg
# contigs 80 93 83 61 69 68 66
Largest contig 745120 664876 562203 663399 434084 345313 437164
Total length 4975695 5031560 5043217 4804004 4805579 4801310 4733683
N50 356974 221472 324225 295449 179662 207976 186993
Misassemblies
# misassemblies 11 17 21 10 13 20 15
Misassembled contigs length 1552524 976207 2108892 1222277 782726 1156917 527873
Mismatches
# mismatches per 100 kbp 3.32 2.91 3.06 7.08 6.4 10.33 4.13
# indels per 100 kbp 2.98 1.38 1.01 13.15 5.2 5.54 2.69
# N's per 100 kbp 0.38 0.12 0.22 0.4 0.37 0.4 0.23
Genome statistics
Genome fraction (%) 99.97 100 100 99.304 99.424 99.522 98.712
Duplication ratio 1.074 1.086 1.090 1.043 1.047 1.04 1.033
# genes 4489 + 7 part 4490 + 7 part 4495 + 2 part 4461 + 25 part 4451 + 31 part 4459 + 28 part 4412 + 32 part
NGA50 357183 221098 279423 226118 179662 194634 191457

We discarded the contigs which fewer than 100 reads aligned. more detail

Statistics without reference 071634_raw_asm.ctg 192221_raw_asm.ctg 210845_raw_asm.ctg 071634_100X_asm.ctg 071634_118X_asm.ctg 192221_118X_asm.ctg 210845_118X_asm.ctg
# contigs 19 24 21 28 38 29 31
Largest contig 745120 664876 592203 663399 434084 345313 437164
Total length 4669108 4675696 4700617 4636263 4644391 4603072 4578972
N50 356974 222559 399011 295449 180706 207976 191458
Misassemblies
# misassemblies 7 6 11 6 5 7 6
Misassembled contigs length 1539749 936587 2058922 1200212 727024 1097466 478971
Mismatches
# mismatches per 100 kbp 2.75 2.75 3.04 7.08 5.85 8.82 3.69
# indels per 100 kbp 2.23 1.1 1.17 13.46 5.83 2.49 2.37
# N's per 100 kbp 0.19 0.02 0.04 0.26 0.15 0.07 0.02
Genome statistics
Genome fraction (%) 99.639 99.699 99.834 99.984 99.051 98.78 98.159
Duplication ratio 1.011 1.011 1.017 1.01 1.015 1.005 1.006
# genes 4473 + 15 part 4465 + 18 part 4480 + 10 part 4435 + 29 part 4431 + 36 part 4413 + 36 part 4380 + 34 part
NGA50 357183 221098 279423 226118 179662 194634 191457

Two SMRT cell reads were corrected with raw, 100X, and 118X short reads. The PBcR were then filtered to 25X or directly assembled by runCA.

Statistics without reference 2_raw_asm.ctg 2_raw_25X_asm.ctg 2_100X_asm.ctg 2_100X_25X_asm.ctg 2_118X_asm.ctg 2_118X_25X_asm.ctg
# contigs 106 80 81 71 81 54
Largest contig 762045 757702 767781 405645 520095 570062
Total length 5168000 5080370 4918962 4799524 4832961 4725680
N50 419161 405539 331262 193986 186504 210927
Misassemblies
# misassemblies 13 18 15 9 16 16
Misassembled contigs length 1591856 1751860 1703983 165469 616747 1468075
Mismatches
# mismatches per 100 kbp 2.44 1.83 5.08 4.34 6.28 6.08
# indels per 100 kbp 0.88 0.82 6.34 2.14 5.61 2.93
# N's per 100 kbp 0.48 0.04 0.94 0.02 0.43 0.08
Genome statistics
Genome fraction (%) 100 100 99.652 98.76 99.567 99.194
Duplication ratio 1.116 1.098 1.065 1.048 1.047 1.028
# genes 4495 + 2 part 4495 + 2 part 4475 + 16 part 4432 + 31 part 4458 + 30 part 4434 + 43 part
NGA50 418393 405538 235822 193833 194196 199657

We discarded the contigs which fewer than 100 reads aligned. more detail

Statistics without reference 2_raw_asm.ctg 2_raw_25X_asm.ctg 2_100X_asm.ctg 2_100X_25X_asm.ctg 2_118X_asm.ctg 2_118X_25X_asm.ctg
# contigs 16 17 22 33 35 32
Largest contig 762045 757702 767781 405645 520095 570062
Total length 4650035 4675233 4651814 4574523 4648591 4588060
N50 514903 405539 331262 193986 194625 223426
Misassemblies
# misassemblies 4 6 10 4 5 8
Misassembled contigs length 1564680 1697372 1683620 141677 569893 1424613
Mismatches
# mismatches per 100 kbp 2.43 2.23 4.94 1.8 5.56 5.97
# indels per 100 kbp 1.76 0..84 6.19 1.8 4.48 2.86
# N's per 100 kbp 0.06 0 0.26 0 0.13 0.04
Genome statistics
Genome fraction (%) 99.371 99.633 99.536 98.364 99.163 98.603
Duplication ratio 1.006 1.013 1.009 1.003 1.011 1.004
# genes 4458 + 13 part 4466 + 8 part 4462 + 22 part 4404 + 39 part 4438 + 32 part 4405 + 42 part
NGA50 418393 405538 235822 193833 194196 199657

Three SMRT cells reads were corrected with raw, 100X, and 118 short reads. The PBcR were then filtered to 25X or directly assembled by runCA.

Statistics without reference 3_raw_asm.ctg 3_raw_25X_asm.ctg 3_100X_asm.ctg 3_100X_25X_asm.ctg 3_118X_asm.ctg 3_118X_25X_asm.ctg
# contigs 219 74 98 32 86 39
Largest contig 771076 1426293 981874 822480 1091515 520962
Total length 5873961 5171438 5051244 4730819 4906749 4668968
N50 247798 317846 413464 600008 286035 218547
Misassemblies
# misassemblies 25 10 22 8 25 11
Misassembled contigs length 1361077 1372143 2201123 1855654 1800186 1350243
Mismatches
# mismatches per 100 kbp 4.03 2.5 2.030 1.5 4.71 5.45
# indels per 100 kbp 1.68 0.97 5.64 3.3 4.13 3.86
# N's per 100 kbp 0.34 0.140 0.46 0.11 0.18 0.02
Genome statistics
Genome fraction (%) 100 100 99.733 99.197 99.69 98.93
Duplication ratio 1.268 1.116 1.092 1.028 1.063 1.018
# genes 4494 + 3 part 4495 + 2 part 4484 + 9 part 4460 + 19 part 4468 + 20 part 4427 + 35 part
NGA50 286997 323732 348693 599239 286035 193471

We discarded the contigs which fewer than 100 reads aligned. more detail

Statistics without reference 3_raw_asm.ctg 3_raw_25X_asm.ctg 3_100X_asm.ctg 3_100X_25X_asm.ctg 3_118X_asm.ctg 3_118X_25X_asm.ctg
# contigs 29 15 20 18 27 29
Largest contig 771076 1426293 981876 822480 1091515 520962
Total length 4672216 4666874 4656670 4613610 4650236 4593856
N50 316212 323732 413464 600008 286035 218547
Misassemblies
# misassemblies 6 4 10 6 7 7
Misassembled contigs length 1270144 1321332 2130162 1844194 1736227 1326538
Mismatches
# mismatches per 100 kbp 2.07 2.14 1.67 1.48 4.35 5.09
# indels per 100 kbp 0.98 0.65 5.54 3.31 2.88 3.76
# N's per 100 kbp 0.06 0 0.17 0.02 0.04 0
Genome statistics
Genome fraction (%) 99.037 99.636 99.615 99.087 99.478 98.614
Duplication ratio 1.017 1.01 1.008 1.004 1.009 1.005
# genes 4439 + 24 part 4467 + 11 part 4472 + 19 part 4454 + 22 part 4457 + 21 part 4410 + 35 part
NGA50 286997 323732 348693 599239 286035 193471

Four SMRT cell reads were corrected with raw, 100X, and 118X short reads. The PBcR were then filtered to 25X or directly assembled by runCA.

Statistics without reference 4_raw_asm.ctg 4_raw_25X_asm.ctg 4_100X_asm.ctg 4_100X_25X_asm.ctg 4_118X_asm.ctg 4_118X_25X_asm.ctg
# contigs 286 51 123 23 71 40
Largest contig 532128 1812746 688723 1257198 983533 621920
Total length 6162978 5045811 5144868 4693193 4862387 4665855
N50 147254 834736 398131 694380 412226 285200
Misassemblies
# misassemblies 24 13 26 8 31 13
Misassembled contigs length 800651 3633076 2341550 2708632 2412628 1302367
Mismatches
# mismatches per 100 kbp 3.41 2.240 5.060 1.93 4.45 7.3
# indels per 100 kbp 1.36 0.97 5.21 3.82 4.08 5.71
# N's per 100 kbp 1.2 0.16 0.39 0.04 0.41 0
Genome statistics
Genome fraction (%) 100 100 99.74 99.337 99.798 98.559
Duplication ratio 1.331 1.089 1.112 1.019 1.052 1.022
# genes 4495 + 2 part 4494 + 3 part 4482 + 10 part 4470 + 9 part 4481 + 12 part 4413 + 36 part
NGA50 182232 476726 317322 694380 286770 214828

We discarded the contigs which fewer than 100 reads aligned. more detail

Statistics without reference 4_raw_asm.ctg 4_raw_25X_asm.ctg 4_100X_asm.ctg 4_100X_25X_asm.ctg 4_118X_asm.ctg 4_118X_25X_asm.ctg
# contigs 40 12 21 12 21 26
Largest contig 532128 1812746 688723 1257198 983533 621920
Total length 4726973 4659487 4656544 4602600 4654299 4508804
N50 180844 834736 398131 1071366 412226 285200
Misassemblies
# misassemblies 7 8 8 6 15 9
Misassembled contigs length 736689 3595196 2252884 2698687 2362706 1274915
Mismatches
# mismatches per 100 kbp 2.35 2.28 2.53 2 4.41 6.26
# indels per 100 kbp 0.87 0.75 3.94 3.81 3.31 4.49
# N's per 100 kbp 0.17 0.13 0.02 0.04 0.04 0
Genome statistics
Genome fraction (%) 99.193 99.215 99.62 98.967 99.687 97.023
Duplication ratio 1.028 1.012 1.008 1.003 1.009 1.003
# genes 4443 + 26 part 4456 + 12 part 4474 + 17 part 4452 + 11 part 4474 + 15 part 4344 + 37 part
NGA50 182232 476726 317322 694380 286770 247828