Haloferax volcanii DS2
Three assemblies are available from How to score genome assemblies using the Mauve system.
Assembly | Description |
volc454 | It was sequenced using 454 pyrosequencing by Roach Inc on a GS FLX Titanium instrument. 25x coverage of reads were obtained. Reads were assembled to contigs with Newbler by Roache. |
volcV | It was sequenced to 25x coverage using Illumina 100 nt read pairs with 500 nt inserts, and 15x coverage of 50 nt Illumina mate-pairs with 6.5 kbp insert. Both data type were generated by BGI. The assembly was constructed with velvet using the above ginve insert size estimates and default parameters. No read error ecoorection or quality trimming steps were performed. |
volcIDBA | It was sequenced with 80x coverage 76 nt read pairs with 300 nt inserts on an Illumina GAIIx instrument at UC Davis Genome Center, and 2x coverage of 50 nt mate-pairs with 6.5 kbp insert sequences at BGI. The reads were error corrected with REPTILE using default parameters, contigs assembled with IDBA using the custome parameters --mink 33 --maxk 78 and evertything else default, and scaffolded with SSPACE using the custom parameter -a 0.5 and everything else default. |
Name | NumContigs | NumAssemblyBases | NumMisCalled | NumUnCalled | NumGapsRef | NumGapsAssembly | TotalBasesMissed | %Missed | ExtraBases | %Extra | BrokenCDS | IntactCDS | ContigN50 | ContigN90 | MaxContigLength | N50 |
volc454 | 157 | 3920004 | 90 | 0 | 141 | 128 | 119928 | 2.9886 | 1818 | 0.0464 | 30 | 3908 | 123582 | 11735 | 217295 | 127504 |
volcV | 1394 | 4394403 | 925 | 30431 | 1848 | 1739 | 161388 | 4.0217 | 503214 | 11.4512 | 505 | 3433 | 843300 | 57 | 1354539 | 1110042 |
volcIDBA | 367 | 3880100 | 1209 | 5884 | 999 | 991 | 155857 | 3.8839 | 22465 | 0.579 | 442 | 3496 | 19349 | 5537 | 99636 | 19372 |
Since minimus2 can only merge two assemblies at a time, we iteratively applied it to integrate more assemblies. We have thoroughly test all combinations for minimus2 in the case of H. volc because only three assemblies were available.
The name of file with 'rawctg.fa' is raw contig from Mauve
The name with '.ctg.fa' is the splited contig by contiguous 'N'.
The split references for MAIA and the integrated results can be downloaded hvolc_maia.
Name | NumContigs | NumAssemblyBases | DCJ_Distance | NumMisCalled | NumUnCalled | NumGapsRef | NumGapsAssembly | TotalBasesMissed | %Missed | ExtraBases | %Extra | BrokenCDS | IntactCDS | ContigN50^ | MaxContigLength | N50^ | Blast_IntactCDS | Units(>200) | N50^ | cor.Units | cor.N50^ | Errors,(Indel>=5,Inv,Rel) |
Hvolc.454 | 157 | 3920004 | 117 | 56 | 0 | 139 | 124 | 118089 | 2.9427 | 1365 | 0.0348 | 34 | 3981 | 123582 | 217295 | 127504 | 3953 | 145 | 123582 | 137 | 121280 | 8,(3,0,5) |
Hvolc.V | 1555 | 3855484 | 1674 | 748 | 1525 | 1540 | 1539 | 197737 | 4.9275 | 16581 | 0.4301 | 458 | 3557 | 9037 | 55518 | 9092 | 3144 | 997 | 8440 | 1302 | 5773 | 201,(154,1,46) |
Hvolc.IDBA | 580 | 3871717 | 602 | 963 | 548 | 1078 | 986 | 162423 | 4.0475 | 19753 | 0.5102 | 440 | 3575 | 12787 | 53121 | 12830 | 3411 | 580 | 12333 | 1100 | 6229 | 499,(479,9,11) |
CISA | 72 | 4041406 | 75 | 182 | 26 | 144 | 126 | 196790 | 4.9039 | 124329 | 3.0764 | 55 | 3960 | 107315 | 222325 | 109517 | 3910 | 72 | 109517 | 111 | 83934 | 38,(31,3,4) |
GAA# | 693 | 3934772 | 688 | 615 | 685 | 836 | 784 | 158220 | 3.942783333 | 54375 | 1.37315 | 285 | 3730 | 52216 | 122155 | 54582 | 3593 | 495 | 53558 | 762 | 46217 | 237,(213,3,21) |
MAIA (split6) | 6 | 4344441 | 8 | 383 | 105197 | 550 | 554 | 859884 | 21.428 | 1186092 | 27.3014 | 392 | 3623 | 672888 | 1667164 | 1460314 | 3024 | 6 | 1460314 | 547 | 9097 | 646,(610,4,32) |
MAIA (split6&n) | 893 | 3619301 | 875 | 482 | 391 | 875 | 817 | 970819 | 24.1925 | 251606 | 6.9518 | 344 | 3671 | 16556 | 265643 | 16602 | 2946 | 649 | 14108 | 691 | 7337 | 59,(56,0,3) |
minimus2# | 179 | 4168210 | 192 | 641 | 910 | 545 | 328 | 214842 | 5.3538 | 354126 | 8.029783333 | 261 | 3754 | 103468 | 224742 | 113003 | 3855 | 179 | 113978 | 413 | 43800 | 212,(200,4,8) |
minimus2(1,2,3) | 65 | 4087988 | 70 | 545 | 1037 | 497 | 155 | 95143 | 2.3709 | 156534 | 3.8291 | 256 | 3759 | 171050 | 342018 | 182445 | 3977 | 65 | 182445 | 284 | 29652 | 213,(205,2,6) |
minimus2(1,3,2) | 71 | 4178001 | 80 | 488 | 962 | 529 | 168 | 196043 | 4.8853 | 339761 | 8.1321 | 259 | 3756 | 169924 | 341963 | 171745 | 3956 | 71 | 171745 | 293 | 29632 | 216,(205,6,5) |
minimus2(2,1,3) | 65 | 4089672 | 72 | 558 | 1067 | 514 | 165 | 97106 | 2.4198 | 172563 | 4.2195 | 259 | 3756 | 171050 | 342018 | 182445 | 3974 | 65 | 182445 | 290 | 28788 | 218,(210,2,6) |
minimus2(2,3,1) | 75 | 4296848 | 78 | 451 | 485 | 376 | 139 | 245319 | 6.1133 | 480011 | 11.1712 | 138 | 3877 | 146572 | 312727 | 150030 | 3915 | 75 | 150030 | 204 | 47330 | 122,(114,4,4) |
minimus2(3,1,2) | 71 | 4178049 | 79 | 510 | 925 | 526 | 172 | 195217 | 4.8647 | 339838 | 8.1339 | 263 | 3752 | 169924 | 342039 | 171745 | 3955 | 71 | 171745 | 294 | 29652 | 217,(206,6,5) |
minimus2(3,2,1) | 78 | 4341081 | 84 | 682 | 476 | 389 | 165 | 245907 | 6.1279 | 509238 | 11.7307 | 141 | 3874 | 137147 | 312741 | 146572 | 3929 | 78 | 150030 | 216 | 48253 | 131,(122,4,5) |
[^] Please note that the ContigN50 calculated by Mauve Assembly Metrics is incorrect (off-by-one error). We have followed the definition of N50 (A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list. ref) to calculate N50s. GAGE's N50 was calculated using the total reference genome length rather than the sum total of contig lengths. The GAGE's cor.N50 values were computed after correcting contigs by breaking them at each error.
[#] Please note that GAA and minimus2 were designed to merge two assemblies at a time, we thus performed all runs and took the average scores.