E. coli 100 bp

Revision as of 21 August 2013 00:52 by admin (Comments | Contribs)

Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length.

Read source

The paired-end illumina read data of E. coli were downloaded from Illumina (|Illumina) with a median insert size of 214 bp. More than 28.4 M reads

Sequence assembly

Software Version Parameters Download
ABySS 1.3.0 k=75 Abyss
Velvet 1.1.04 VelvetOptimiser --s 59 --e 97 Velvet
Edena 3 m=75 Edena
SOAPdenovo 1.05 k=75 M=3 avg_ins=215 SOAPdenovo

Merged File: E100_Contigs

Contig integrator

Integrator Download
CISA CISA
MAIA maia_ecoli_100bp
minimus2 minimus2(AEVS),minimus2(ASEV), minimus2(ASVE), minimus2(ESAV),minimus2(ESVA), minimus2(SEAV), minimus2(SEVA), minimus2(SVAE), minimus2(VASE), minimus2(VEAS)
GAA GAA(AESV),GAA(AEVS), GAA(ASEV), GAA(EASV),GAA(EAVS), GAA(ESAV), GAA(EVAS), GAA(EVSA), GAA(VAES), GAA(VASE)

Beacuase minimus2 and GAA merge two assemblies at a time, we iteratively integrate the four assemblies in random order.

Evaluation

  • Benchmark genome
Eshcherichia coli K12 MG1655
  • Evaluated by Mauve Assembly Metrics to calculate the values for the left columns of "N50, Blast_IntactCDS"
How to score genome assemblies using the Mauve system (mauve_linux_snapshot_2011-08-31)
  • Evaluated by Blast with Features
  • Evaluated by GAGE to calculate the values for the right columns of "Blast_IntactCDS"
Gage
  • Score with Mauve Assembly Metrics, N50, Blast and GAGE:
Name NumContigs NumAssemblyBases DCJ_Distance NumMisCalled NumUnCalled NumGapsRef NumGapsAssembly TotalBasesMissed %Missed ExtraBases %Extra BrokenCDS IntactCDS ContigN50^ ContigN90 MaxContigLength N50^ Blast_IntactCDS Units(>200) N50^ cor.Units cor.N50^ Errors,(Indel>=5,Inv,Rel)
Abyss 190 4642915 119 125 75 147 128 50280 1.0837 23618 0.5087 76 4244 66424 20680 222412 68544 4243 135 68544 134 68545 3,(0,0,3)
Edena 421 4584984 331 22 0 281 271 86700 1.8687 6753 0.1473 145 4175 24375 6689 104739 24632 4048 377 24375 373 24375 3,(1,1,1)
SOAPdenovo 560 4596003 272 100 0 245 256 112014 2.4143 6408 0.1394 111 4209 31788 7901 105615 31837 4105 356 31837 358 31837 3,(1,0,2)
Velvet 264 4569720 191 127 33 204 210 102966 2.2193 3936 0.0861 92 4228 46975 11116 161713 47550 4159 228 46975 241 44024 16,(4,7,5)
CISA 110 4641820 106 156 64 153 120 44382 0.9566 39138 0.8432 69 4251 77896 25720 222524 79212 4248 106 79212 118 70571 12,(3,4,5)
GAA# 354 4616496 223 116 31 212 206 75683 1.6312 13859 0.2995 102 4218 45106 12495 153242 46435 4152 268 46295 264 44611 6,(2,1,3)
GAA* 311 4636486 222 105 39 216 197 59416 1.2806 22776 0.4906 107 4213 48108 13772 162326 50871 4152 261 50871 248 47458 6,(3,1,2)
MAIA (split3) 3 4732065 3 109 10520 152 150 412160 8.8834 507027 10.7147 94 4226 1420571 1420571 1861168 1450326 3886 - - - - -
MAIA (split3&n) 263 4351338 239 129 12 242 236 422260 9.1011 20759 0.4771 94 4226 47744 11114 161713 47928 3882 232 45905 206 44024 1,(0,0,1)
minimus2# 164 4593120 141 180 0 150 144 115622 2.4920 34933 0.7593 79 4241 62206 18516 184547 63799 4177 159 63375 167 55985 10,(4,3,3)
minimus2* 94 4588207 88 233 0 129 103 322633 6.9538 209679 4.5169 76 4244 86379 27964 225809 88309 4077 94 88183 107 68430 13,(6,3,4)

[^] Please note that the ContigN50 calculated by Mauve Assembly Metrics is incorrect (off-by-one error). We have followed the definition of N50 (A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list. ref) to calculate N50s. As stated in GAGE, GAGE's N50 was calculated using the total reference genome length rather than the sum total of contig lengths. The GAGE's cor.N50 values were computed after correcting contigs by breaking them at each error.

[#] Please note that GAA and minimus2 were designed to merge two assemblies at a time, we thus performed all (runs) and took the average scores.

[*] Please note that the scores of minimus2 and GAA were taken from the average of ten random combinations (details).