Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length.
Read source
- The illuminia read data of E. coli (Paired-end sequencing library with 200 bp inserts) were downloaded from Sequence Read Archive (SRA). More than 20.8 M reads
Sequence assembly
- Set1 (Different Assemblers)
Software |
Version |
Parameters |
Download |
ABySS |
1.3.0 |
k=31 |
Abyss |
Velvet |
1.1.04 |
k=29 ins_length=215 cov_cutoff=12 exp_cov=24 min_contig_lgth=100 scaffolding=no |
Velvet |
Edena |
3 |
m=30 |
Edena |
SOAPdenovo |
1.05 |
K=29 M=3 |
SOAPdenovo |
CLC |
4.7.2 |
insert_size_range=194,236 minimum_contig_length=100 |
CLC |
Merged File: Set1_Contig
- Set2 (Different parameters for Abyss - the assembler provides the lowest number of contigs in Set1)
Merged File: Set2_Contig
- Set3 (Different parameters for SOAPdenovo - the assembler provides the largest number of contigs in Set1)
Merged File: Set3_Contig
Contig integrator
Evaluation
- Eshcherichia coli K12 MG1655
- Evaluate by Mauve Assembly Metrics
- How to score genome assemblies using the Mauve system
- Score with Mauve metrics:
Set1
Name |
NumContigs |
NumAssemblyBases |
NumMisCalled |
NumUnCalled |
NumGapsRef |
NumGapsAssembly |
TotalBasesMissed |
PercBasesMissed |
ExtraBases |
PercExtraBases |
BrokenCDS |
IntactCDS |
ContigN50 |
ContigN90 |
MaxContigLength |
Abyss |
133 |
4626205 |
334 |
69 |
123 |
119 |
57847 |
1.2468 |
29424 |
0.636 |
57 |
4263 |
96157 |
26096 |
222425 |
CLC |
379 |
4546926 |
100 |
0 |
288 |
287 |
130550 |
2.8138 |
3405 |
0.0749 |
62 |
4258 |
29767 |
8447 |
107342 |
Edena |
211 |
4569446 |
17 |
0 |
129 |
125 |
86780 |
1.8704 |
2078 |
0.0455 |
66 |
4254 |
54405 |
13642 |
186686 |
SOAPdenovo |
553 |
4547211 |
36 |
0 |
461 |
412 |
124407 |
2.6814 |
6972 |
0.1533 |
100 |
4220 |
17902 |
5384 |
103369 |
Velvet |
283 |
4550675 |
138 |
0 |
208 |
203 |
116542 |
2.5119 |
2783 |
0.0612 |
74 |
4246 |
52474 |
12537 |
166094 |
CISA_Set1 |
77 |
4625581 |
288 |
73 |
93 |
96 |
52449 |
1.1304 |
32037 |
0.6926 |
44 |
4276 |
115197 |
32288 |
310695 |
minimus2 |
74 |
4608653 |
285 |
0 |
97 |
78 |
76881 |
1.657 |
35464 |
0.7695 |
50 |
4270 |
126075 |
34542 |
417704 |
Set2-Set3
Name |
NumContigs |
NumAssemblyBases |
NumMisCalled |
NumUnCalled |
NumGapsRef |
NumGapsAssembly |
TotalBasesMissed |
PercBasesMissed |
ExtraBases |
PercExtraBases |
BrokenCDS |
IntactCDS |
ContigN50 |
ContigN90 |
MaxContigLength |
Abyss_k29 |
130 |
4634010 |
322 |
30 |
118 |
115 |
61835 |
1.3327 |
40405 |
0.8719 |
54 |
4266 |
95691 |
26567 |
268182 |
Abyss_k31 |
133 |
4626205 |
334 |
69 |
123 |
119 |
57847 |
1.2468 |
29424 |
0.636 |
57 |
4263 |
96157 |
26096 |
222425 |
Abyss_k33 |
135 |
4644184 |
354 |
338 |
139 |
119 |
66355 |
1.4302 |
44937 |
0.9676 |
78 |
4242 |
89001 |
24907 |
268398 |
CISA_Set2 |
105 |
4635199 |
332 |
130 |
117 |
103 |
55567 |
1.1976 |
39517 |
0.8525 |
63 |
4257 |
113377 |
27272 |
222663 |
SOAP_k29 |
1373 |
4582756 |
48 |
0 |
466 |
415 |
124372 |
2.6806 |
7247 |
0.1581 |
100 |
4220 |
17892 |
5276 |
103369 |
SOAP_k31 |
1295 |
4583165 |
56 |
0 |
510 |
466 |
121606 |
2.621 |
9201 |
0.2008 |
121 |
4199 |
17003 |
4286 |
77302 |
SOAP_k33 |
2170 |
4608265 |
105 |
0 |
1470 |
1380 |
126273 |
2.7216 |
41165 |
0.8933 |
507 |
3813 |
5391 |
1449 |
22953 |
CISA_Set3 |
465 |
4546819 |
117 |
0 |
402 |
366 |
133247 |
2.8719 |
19266 |
0.4237 |
95 |
4225 |
21543 |
6065 |
103369 |
CISA_Set2&3 |
105 |
4636783 |
351 |
160 |
118 |
104 |
54999 |
1.1854 |
39905 |
0.8606 |
60 |
4260 |
113377 |
27272 |
222663 |
CISA_Set_1_2_3_2&3 |
72 |
4637107 |
529 |
53 |
109 |
97 |
43390 |
0.9352 |
37158 |
0.8013 |
44 |
4276 |
115185 |
35678 |
310556 |
- Scaffold the contigs using SSPACE
- Since we have the paired-end reads of E. coli, it is possible to assess the order, distance and orientation of contigs and combine them into scaffolds. We, therefore, used SSPACE to scaffold the contigs and quantified the scaffolds by Mauve assembly metrics.
Name |
NumContigs |
NumAssemblyBases |
NumMisCalled |
NumUnCalled |
NumGapsRef |
NumGapsAssembly |
TotalBasesMissed |
PercBasesMissed |
ExtraBases |
PercExtraBases |
BrokenCDS |
IntactCDS |
ContigN50 |
MaxContigLength |
CISA+SSPACE |
69 |
4625880 |
362 |
157 |
93 |
98 |
52261 |
1.1264 |
34643 |
0.7489 |
43 |
4277 |
126254 |
316040 |
Abyss+SSPACE |
101 |
4627104 |
393 |
735 |
114 |
119 |
54956 |
1.1845 |
33747 |
0.7293 |
57 |
4263 |
107040 |
268750 |
minimus2+SSPACE |
64 |
4608774 |
337 |
54 |
93 |
76 |
75502 |
1.6273 |
36021 |
0.7816 |
49 |
4271 |
150458 |
420117 |
- Indented line
The results show that (1) the integrated contigs output by CISA can be scaffolded by SSPACE to a limited extent, which suggests that our CISA can indeed integrate the sequence information from different assemblies. (2) to introduce the paired-end reads to the pre-assembled contigs generated by Abyss using SSPACE (Abyss+SSPACE) can only reduce the number of contigs to 101, smaller than the effect made by CISA, which suggests that contig integration prior to scaffolding can further enhance the result. (3) the problem of misassembled contigs generated by minimus2 is not yet solved by SSPACE, which suggests that we should integrate contigs with caution.