Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length.
Software | Version | Parameters | Download |
ABySS | 1.3.0 | k=31 | Abyss |
Velvet | 1.1.04 | k=29 ins_length=215 cov_cutoff=12 exp_cov=24 min_contig_lgth=100 scaffolding=no | Velvet |
Edena | 3 | m=30 | Edena |
SOAPdenovo | 1.05 | K=29 M=3 | SOAPdenovo |
CLC | 4.7.2 | insert_size_range=194,236 minimum_contig_length=100 | CLC |
Merged File: Set1_Contig
Abyss parameter | Download |
k=29 | Abyss_k29 |
k=31 | Abyss_k31 |
k=33 | Abyss_k33 |
Merged File: Set2_Contig
SOAPdenovo parameter | Download |
k=29 | SOAP_k29 |
k=31 | SOAP_k31 |
k=33 | SOAP_k33 |
Merged File: Set3_Contig
Input | Download |
Set1 | CISA_Set1 |
Set2 | CISA_Set2 |
Set3 | CISA_Set3 |
Set2+Set3 | CISA_Set2_3 |
Set1+2+3+2_3 | CISA_Set1+2+3+2_3 |
The input file for CISA is an assembled set of contigs, e.g., the set1 contains the contigs obtained from Abyss, CLC, Edena, SOAPdenovo, and Velvet.
The integrated contigs generated by CISA can be directly downloaded via the Download link.
maia('./assembly_list.txt','./NC_000913.fna')
assembly_list.txt:
Edena ./Edena_contigs.fa 2 SOAP ./SOAP_contigs.fa 2 Velvet ./Velvet.fa 2 CLC ./CLC.fa 2 Abyss ./Abyss_contigs.fa 3
A file named maia_assembly.fa was generated, containing a draft genome of 4640055 bp with 126761 uncalled bases (Ns). We split the genome into contigs (at >10Ns) and got the set of contigs maia.fa.
In using MAIA for other examples, we found that Matlab crashed while the recursion reached its limitation. We therefore split the reference genome into several parts, and then performed MAIA.
maia('./assembly_list.txt','./NC_000913_1.fna') maia('./assembly_list.txt','./NC_000913_2.fna') maia('./assembly_list.txt','./NC_000913_3.fna')
The split references and the integrated results can be downloaded maia_ecoli_36.
All_M2_GAA
Beacuase minimus2 and GAA merge two assemblies at a time, we iteratively integrate the five assemblies in random order.
minimus2: C_S_V_A_E, A_C_V_S_E, S_V_E_C_A, A_E_C_V_S, A_E_S_C_V, E_V_S_C_A, E_V_C_A_S, V_E_S_A_C, V_S_E_A_C, A_E_V_C_S
GAA: E_S_C_V_A, V_S_E_C_A, V_S_C_E_A, S_V_C_E_A, V_E_A_S_C, E_C_V_S_A, V_A_S_C_E, V_C_E_S_A, A_S_C_V_E, S_C_V_E_A
Set1
Name | NumContigs | NumAssemblyBases | DCJ_Distance | NumMisCalled | NumUnCalled | NumGapsRef | NumGapsAssembly | TotalBasesMissed | %Missed | ExtraBases | %Extra | BrokenCDS | IntactCDS | ContigN50^ | ContigN90 | MaxContigLength | N50^ | Blast_IntactCDS | Units(>200) | N50^ | cor.Units | cor.N50^ | Errors,(Indel>=5,Inv,Rel) |
Abyss | 133 | 4626205 | 107 | 334 | 69 | 123 | 119 | 57847 | 1.2468 | 29424 | 0.636 | 57 | 4263 | 96157 | 26096 | 222425 | 96511 | 4249 | 108 | 96511 | 116 | 92933 | 8,(6,0,2) |
CLC | 379 | 4546926 | 304 | 100 | 0 | 288 | 287 | 130550 | 2.8138 | 3405 | 0.0749 | 62 | 4258 | 29767 | 8447 | 107342 | 29905 | 4228 | 288 | 28450 | 290 | 28036 | 2,(0,1,1) |
Edena | 211 | 4569446 | 154 | 17 | 0 | 129 | 125 | 86780 | 1.8704 | 2078 | 0.0455 | 66 | 4254 | 54405 | 13642 | 186686 | 57790 | 4191 | 182 | 54405 | 186 | 52796 | 4,(2,1,1) |
SOAPdenovo | 553 | 4547211 | 475 | 36 | 0 | 461 | 412 | 124407 | 2.6814 | 6972 | 0.1533 | 100 | 4220 | 17902 | 5384 | 103369 | 17944 | 4131 | 450 | 17892 | 451 | 17892 | 1,(0,0,1) |
Velvet | 283 | 4550675 | 207 | 138 | 0 | 208 | 203 | 116542 | 2.5119 | 2783 | 0.0612 | 74 | 4246 | 52474 | 12537 | 166094 | 54359 | 4194 | 217 | 52474 | 224 | 49022 | 8,(5,0,3) |
CISA_Set1 | 72 | 4627549 | 70 | 241 | 50 | 91 | 92 | 49487 | 1.0666 | 32028 | 0.6921 | 44 | 4276 | 119107 | 32288 | 312018 | 126254 | 4276 | 72 | 126254 | 83 | 113511 | 11,(8,0,3) |
GAA# | 314 | 4578451 | 245 | 148 | 7 | 240 | 227 | 97081 | 2.0924 | 10516 | 0.2284 | 72 | 4248 | 49430 | 13068 | 157184 | 51218 | 4205 | 249 | 50305 | 251 | 48138 | 4,(2,0,2) |
GAA* | 311 | 4602917 | 224 | 156 | 3 | 225 | 216 | 93476 | 2.0147 | 11942 | 0.2591 | 76 | 4244 | 49990 | 12208 | 163308 | 51085 | 4208 | 245 | 51075 | 238 | 47954 | 5,(3,0,2) |
MAIA | 110 | 4513348 | 96 | 82 | 54 | 100 | 95 | 129936 | 2.8005 | 1090 | 0.0242 | 48 | 4272 | 112717 | 30950 | 312145 | 126075 | 4212 | 95 | 126075 | 97 | 107674 | 5,(2,0,3) |
minimus2# | 155 | 4598769 | 133 | 206 | 0 | 143 | 138 | 90163 | 1.9433 | 32604 | 0.7058 | 58 | 4262 | 82947 | 21666 | 202745 | 86241 | 4243 | 137 | 85880 | 144 | 80493 | 8,(5,1,2) |
minimus2* | 73 | 4597392 | 67 | 323 | 0 | 96 | 80 | 155862 | 3.3593 | 102792 | 2.2503 | 52 | 4268 | 121942 | 35207 | 296685 | 129557 | 4199 | 72 | 127420 | 83 | 113511 | 11,(7,1,3) |
GAA (Abyss,Edena) | 133 | 4637982 | 102 | 328 | 93 | 118 | 112 | 54835 | 1.1819 | 28888 | 0.6229 | 57 | 4263 | 96157 | 26096 | 222425 | 96511 | 4258 | 108 | 96511 | 115 | 92933 | 8,(6,0,2) |
GAA (A,C,E,S,V) | 138 | 4639673 | 103 | 305 | 93 | 119 | 113 | 54254 | 1.1693 | 29292 | 0.6311 | 57 | 4263 | 96157 | 26096 | 222425 | 96511 | 4258 | 108 | 96511 | 115 | 92933 | 8,(6,0,2) |
MAIA (split3) | 3 | 4915920 | 3 | 68 | 9573 | 89 | 90 | 213328 | 4.5979 | 464447 | 9.4478 | 55 | 4265 | 1422748 | 1422748 | 1927448 | 1565724 | 4217 | - | - | - | - | - |
MAIA (split3&n) | 145 | 4800506 | 112 | 68 | 66 | 116 | 110 | 159557 | 3.439 | 124964 | 2.6031 | 50 | 4270 | 126075 | 30950 | 318482 | 132974 | 4212 | 121 | 145106 | 103 | 107674 | 5,(3,0,2) |
minimus2(A,C,E,S,V) | 74 | 4608653 | 68 | 285 | 0 | 97 | 78 | 76881 | 1.657 | 35464 | 0.7695 | 50 | 4270 | 126075 | 34542 | 417704 | 134584 | 4262 | 73 | 134584 | 83 | 113511 | 10,(7,1,2) |
minimus2(S,C,V,E,A) | 69 | 4215087 | 69 | 214 | 249 | 90 | 78 | 548181 | 11.8151 | 113137 | 2.6841 | 51 | 4269 | 119108 | 35441 | 312145 | 126233 | 3855 | 69 | 115198 | 79 | 105796 | 10,(5,2,3) |
[^] Please note that the ContigN50 calculated by Mauve Assembly Metrics is incorrect (off-by-one error). We have followed the definition of N50 (A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list. ref) to calculate N50s. As stated in GAGE, GAGE's N50 was calculated using the total reference genome length rather than the sum total of contig lengths. The GAGE's cor.N50 values were computed after correcting contigs by breaking them at each error.
[#] Please note that GAA and minimus2 were designed to merge two assemblies at a time, we thus performed all runs and took the average scores.
[*] Please note that the scores of minimus2 and GAA were taken from the average of ten random combinations (details).
Set2-Set3
Name | NumContigs | NumAssemblyBases | DCJ_Distance | NumMisCalled | NumUnCalled | NumGapsRef | NumGapsAssembly | TotalBasesMissed | %Missed | ExtraBases | %Extra | BrokenCDS | IntactCDS | ContigN50 | ContigN90 | MaxContigLength | N50 | Blast_IntactCDS | Units(>200) | N50 | cor.Units | cor.N50 | Errors,(Indel>=5,Inv,Rel) |
Abyss_k29 | 130 | 4634010 | 108 | 322 | 30 | 118 | 115 | 61835 | 1.3327 | 40405 | 0.8719 | 54 | 4266 | 95691 | 26567 | 268182 | 96157 | 4259 | 105 | 96157 | 111 | 89001 | 6,(4,0,2) |
Abyss_k31 | 133 | 4626205 | 107 | 334 | 69 | 123 | 119 | 57847 | 1.2468 | 29424 | 0.636 | 57 | 4263 | 96157 | 26096 | 222425 | 96511 | 4249 | 108 | 96511 | 116 | 92933 | 8,(6,0,2) |
Abyss_k33 | 135 | 4644184 | 106 | 354 | 338 | 139 | 119 | 66355 | 1.4302 | 44937 | 0.9676 | 78 | 4242 | 89001 | 24907 | 268398 | 96157 | 4255 | 112 | 96157 | 119 | 89001 | 8,(5,0,3) |
CISA_Set2 | 106 | 4635648 | 93 | 321 | 131 | 117 | 104 | 55343 | 1.1928 | 39776 | 0.858 | 64 | 4256 | 113377 | 27272 | 222663 | 113512 | 4260 | 94 | 113512 | 101 | 105936 | 7,(5,0,2) |
SOAP_k29 | 1373 | 4582756 | 479 | 48 | 0 | 466 | 415 | 124372 | 2.6806 | 7247 | 0.1581 | 100 | 4220 | 17892 | 5276 | 103369 | 17902 | 4131 | 450 | 17892 | 451 | 17892 | 1,(0,0,1) |
SOAP_k31 | 1295 | 4583165 | 519 | 56 | 0 | 510 | 466 | 121606 | 2.621 | 9201 | 0.2008 | 121 | 4199 | 17003 | 4286 | 77302 | 17052 | 4073 | 502 | 16924 | 503 | 16924 | 1,(0,0,1) |
SOAP_k33 | 2170 | 4608265 | 1519 | 105 | 0 | 1470 | 1380 | 126273 | 2.7216 | 41165 | 0.8933 | 507 | 3813 | 5391 | 1449 | 22953 | 5394 | 3314 | 1459 | 5368 | 1459 | 5365 | 1,(0,0,1) |
CISA_Set3 | 440 | 4532470 | 383 | 35 | 0 | 379 | 338 | 133920 | 2.8864 | 5809 | 0.1282 | 88 | 4232 | 23332 | 6264 | 103369 | 23344 | 4142 | 385 | 23328 | 385 | 23328 | 2,(1,0,1) |
CISA_Set2&3 | 105 | 4636952 | 91 | 344 | 159 | 117 | 103 | 57192 | 1.2327 | 40304 | 0.8692 | 60 | 4260 | 113377 | 27272 | 222663 | 113512 | 4261 | 96 | 113512 | 102 | 105936 | 6,(5,0,1) |
CISA_Set_1_2_3_2&3 | 72 | 4637760 | 72 | 554 | 53 | 112 | 100 | 43060 | 0.9281 | 37481 | 0.8082 | 44 | 4276 | 115185 | 35678 | 310691 | 119686 | 4277 | 71 | 119686 | 80 | 105929 | 9,(7,0,2) |