Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length.
Software | Version | Parameters | Download |
ABySS | 1.3.0 | k=31 | Abyss |
Velvet | 1.1.04 | k=29 ins_length=215 cov_cutoff=12 exp_cov=24 min_contig_lgth=100 scaffolding=no | Velvet |
Edena | 3 | m=30 | Edena |
SOAPdenovo | 1.05 | K=29 M=3 | SOAPdenovo |
CLC | 4.7.2 | insert_size_range=194,236 minimum_contig_length=100 | CLC |
Merged File: Set1_Contig
The above assemblers together with the parameter setting have been selected for de novo assembling of E. coli. After assembly, we discarded contigs with less than 100bp and evaluated the accuracy of the assemblies based on the Mauve Assembly Matrices (the results are shown below). In this set of data, we have different sequence assemblies, each was generated by the different assembler. Because even the same assemble performs differently over varying parameter settings such as kmer, we have tried different parameter settings for Abyss and SOAPdenovo in the following sets.
Abyss parameter | Download |
k=29 | Abyss_k29 |
k=31 | Abyss_k31 |
k=33 | Abyss_k33 |
Merged File: Set2_Contig
SOAPdenovo parameter | Download |
k=29 | SOAP_k29 |
k=31 | SOAP_k31 |
k=33 | SOAP_k33 |
Merged File: Set3_Contig
Input | Download |
Set1 | CISA_Set1 |
Set2 | CISA_Set2 |
Set3 | CISA_Set3 |
Set2+Set3 | CISA_Set2_3 |
Set1+2+3+2_3 | CISA_Set1+2+3+2_3 |
Input | Download |
Set1 | minimus2_Set1 |
Set1
Name | NumContigs | NumAssemblyBases | NumMisCalled | NumUnCalled | NumGapsRef | NumGapsAssembly | TotalBasesMissed | PercBasesMissed | ExtraBases | PercExtraBases | BrokenCDS | IntactCDS | ContigN50 | ContigN90 | MaxContigLength |
Abyss | 133 | 4626205 | 334 | 69 | 123 | 119 | 57847 | 1.2468 | 29424 | 0.636 | 57 | 4263 | 96157 | 26096 | 222425 |
CLC | 379 | 4546926 | 100 | 0 | 288 | 287 | 130550 | 2.8138 | 3405 | 0.0749 | 62 | 4258 | 29767 | 8447 | 107342 |
Edena | 211 | 4569446 | 17 | 0 | 129 | 125 | 86780 | 1.8704 | 2078 | 0.0455 | 66 | 4254 | 54405 | 13642 | 186686 |
SOAPdenovo | 553 | 4547211 | 36 | 0 | 461 | 412 | 124407 | 2.6814 | 6972 | 0.1533 | 100 | 4220 | 17902 | 5384 | 103369 |
Velvet | 283 | 4550675 | 138 | 0 | 208 | 203 | 116542 | 2.5119 | 2783 | 0.0612 | 74 | 4246 | 52474 | 12537 | 166094 |
CISA_Set1 | 77 | 4625581 | 288 | 73 | 93 | 96 | 52449 | 1.1304 | 32037 | 0.6926 | 44 | 4276 | 115197 | 32288 | 310695 |
minimus2 | 74 | 4608653 | 285 | 0 | 97 | 78 | 76881 | 1.657 | 35464 | 0.7695 | 50 | 4270 | 126075 | 34542 | 417704 |
We have visually inspected the assemblies against the reference genome (NC_000913) by using graphic representations, e.g. dot plots. Therefore, we knew that the largest contig generated by minimus2 was misassembled.
Set2-Set3
Name | NumContigs | NumAssemblyBases | NumMisCalled | NumUnCalled | NumGapsRef | NumGapsAssembly | TotalBasesMissed | PercBasesMissed | ExtraBases | PercExtraBases | BrokenCDS | IntactCDS | ContigN50 | ContigN90 | MaxContigLength |
Abyss_k29 | 130 | 4634010 | 322 | 30 | 118 | 115 | 61835 | 1.3327 | 40405 | 0.8719 | 54 | 4266 | 95691 | 26567 | 268182 |
Abyss_k31 | 133 | 4626205 | 334 | 69 | 123 | 119 | 57847 | 1.2468 | 29424 | 0.636 | 57 | 4263 | 96157 | 26096 | 222425 |
Abyss_k33 | 135 | 4644184 | 354 | 338 | 139 | 119 | 66355 | 1.4302 | 44937 | 0.9676 | 78 | 4242 | 89001 | 24907 | 268398 |
CISA_Set2 | 105 | 4635199 | 332 | 130 | 117 | 103 | 55567 | 1.1976 | 39517 | 0.8525 | 63 | 4257 | 113377 | 27272 | 222663 |
SOAP_k29 | 1373 | 4582756 | 48 | 0 | 466 | 415 | 124372 | 2.6806 | 7247 | 0.1581 | 100 | 4220 | 17892 | 5276 | 103369 |
SOAP_k31 | 1295 | 4583165 | 56 | 0 | 510 | 466 | 121606 | 2.621 | 9201 | 0.2008 | 121 | 4199 | 17003 | 4286 | 77302 |
SOAP_k33 | 2170 | 4608265 | 105 | 0 | 1470 | 1380 | 126273 | 2.7216 | 41165 | 0.8933 | 507 | 3813 | 5391 | 1449 | 22953 |
CISA_Set3 | 465 | 4546819 | 117 | 0 | 402 | 366 | 133247 | 2.8719 | 19266 | 0.4237 | 95 | 4225 | 21543 | 6065 | 103369 |
CISA_Set2&3 | 105 | 4636783 | 351 | 160 | 118 | 104 | 54999 | 1.1854 | 39905 | 0.8606 | 60 | 4260 | 113377 | 27272 | 222663 |
CISA_Set_1_2_3_2&3 | 72 | 4637107 | 529 | 53 | 109 | 97 | 43390 | 0.9352 | 37158 | 0.8013 | 44 | 4276 | 115185 | 35678 | 310556 |
Name | NumContigs | NumAssemblyBases | NumMisCalled | NumUnCalled | NumGapsRef | NumGapsAssembly | TotalBasesMissed | PercBasesMissed | ExtraBases | PercExtraBases | BrokenCDS | IntactCDS | ContigN50 | MaxContigLength |
CISA+SSPACE | 69 | 4625880 | 362 | 157 | 93 | 98 | 52261 | 1.1264 | 34643 | 0.7489 | 43 | 4277 | 126254 | 316040 |
Abyss+SSPACE | 101 | 4627104 | 393 | 735 | 114 | 119 | 54956 | 1.1845 | 33747 | 0.7293 | 57 | 4263 | 107040 | 268750 |
minimus2+SSPACE | 64 | 4608774 | 337 | 54 | 93 | 76 | 75502 | 1.6273 | 36021 | 0.7816 | 49 | 4271 | 150458 | 420117 |