H. volc

Revision as of 26 February 2013 19:46 by admin (Comments | Contribs)

Haloferax volcanii DS2

Contigs source

Three assemblies are available from How to score genome assemblies using the Mauve system.

Sequence assembly

Assembly Description
volc454 It was sequenced using 454 pyrosequencing by Roach Inc on a GS FLX Titanium instrument. 25x coverage of reads were obtained. Reads were assembled to contigs with Newbler by Roache.
volcV It was sequenced to 25x coverage using Illumina 100 nt read pairs with 500 nt inserts, and 15x coverage of 50 nt Illumina mate-pairs with 6.5 kbp insert. Both data type were generated by BGI. The assembly was constructed with velvet using the above ginve insert size estimates and default parameters. No read error ecoorection or quality trimming steps were performed.
volcIDBA It was sequenced with 80x coverage 76 nt read pairs with 300 nt inserts on an Illumina GAIIx instrument at UC Davis Genome Center, and 2x coverage of 50 nt mate-pairs with 6.5 kbp insert sequences at BGI. The reads were error corrected with REPTILE using default parameters, contigs assembled with IDBA using the custome parameters --mink 33 --maxk 78 and evertything else default, and scaffolded with SSPACE using the custom parameter -a 0.5 and everything else default.
  • Scored with Mauve Assembly Metrics:
Name NumContigs NumAssemblyBases NumMisCalled NumUnCalled NumGapsRef NumGapsAssembly TotalBasesMissed %Missed ExtraBases %Extra BrokenCDS IntactCDS ContigN50 ContigN90 MaxContigLength N50
volc454 157 3920004 90 0 141 128 119928 2.9886 1818 0.0464 30 3908 123582 11735 217295 127504
volcV 1394 4394403 925 30431 1848 1739 161388 4.0217 503214 11.4512 505 3433 843300 57 1354539 1110042
volcIDBA 367 3880100 1209 5884 999 991 155857 3.8839 22465 0.579 442 3496 19349 5537 99636 19372

Contig integrator

All Contigs

Since minimus2 can only merge two assemblies at a time, we iteratively applied it to integrate more assemblies. We have thoroughly test all combinations for minimus2 in the case of H. volc because only three assemblies were available.

The name of file with 'rawctg.fa' is raw contig from Mauve
The name with '.ctg.fa' is the splited contig by contiguous 'N'.

The split references for MAIA and the integrated results can be downloaded hvolc_maia.

Evaluation

  • Benchmark genome
Haloferax_volcanii_DS2.gbk
or
NCBI
  • Evaluated by Mauve Assembly Metrics to calculate the values for the left columns of "N50, Blast_IntactCDS"
How to score genome assemblies using the Mauve system (mauve_linux_snapshot_2011-08-31)
  • Evaluated by Blast with Features
  • Evaluated by GAGE to calculate the values for the right columns of "Blast_IntactCDS"
Gage
  • Score with Mauve Assembly Metrics, N50, Blast and GAGE:
Name NumContigs NumAssemblyBases DCJ_Distance NumMisCalled NumUnCalled NumGapsRef NumGapsAssembly TotalBasesMissed %Missed ExtraBases %Extra BrokenCDS IntactCDS ContigN50^ MaxContigLength N50^ Blast_IntactCDS Units(>200) N50^ cor.Units cor.N50^ Errors,(Indel>=5,Inv,Rel)
Hvolc.454 157 3920004 117 56 0 139 124 118089 2.9427 1365 0.0348 34 3981 123582 217295 127504 3953 145 123582 137 121280 8,(3,0,5)
Hvolc.V 1555 3855484 1674 748 1525 1540 1539 197737 4.9275 16581 0.4301 458 3557 9037 55518 9092 3144 997 8440 1302 5773 201,(154,1,46)
Hvolc.IDBA 580 3871717 602 963 548 1078 986 162423 4.0475 19753 0.5102 440 3575 12787 53121 12830 3411 580 12333 1100 6229 499,(479,9,11)
CISA 72 4041406 75 182 26 144 126 196790 4.9039 124329 3.0764 55 3960 107315 222325 109517 3910 72 109517 111 83934 38,(31,3,4)
GAA# 693 3934772 688 615 685 836 784 158220 3.942783333 54375 1.37315 285 3730 52216 122155 54582 3593 495 53558 762 46217 237,(213,3,21)
MAIA (split6) 6 4344441 8 383 105197 550 554 859884 21.428 1186092 27.3014 392 3623 672888 1667164 1460314 3024 6 1460314 547 9097 646,(610,4,32)
MAIA (split6&n) 893 3619301 875 482 391 875 817 970819 24.1925 251606 6.9518 344 3671 16556 265643 16602 2946 649 14108 691 7337 59,(56,0,3)
minimus2# 179 4168210 192 641 910 545 328 214842 5.3538 354126 8.029783333 261 3754 103468 224742 113003 3855 179 113978 413 43800 212,(200,4,8)
minimus2(1,2,3) 65 4087988 70 545 1037 497 155 95143 2.3709 156534 3.8291 256 3759 171050 342018 182445 3977 65 182445 284 29652 213,(205,2,6)
minimus2(1,3,2) 71 4178001 80 488 962 529 168 196043 4.8853 339761 8.1321 259 3756 169924 341963 171745 3956 71 171745 293 29632 216,(205,6,5)
minimus2(2,1,3) 65 4089672 72 558 1067 514 165 97106 2.4198 172563 4.2195 259 3756 171050 342018 182445 3974 65 182445 290 28788 218,(210,2,6)
minimus2(2,3,1) 75 4296848 78 451 485 376 139 245319 6.1133 480011 11.1712 138 3877 146572 312727 150030 3915 75 150030 204 47330 122,(114,4,4)
minimus2(3,1,2) 71 4178049 79 510 925 526 172 195217 4.8647 339838 8.1339 263 3752 169924 342039 171745 3955 71 171745 294 29652 217,(206,6,5)
minimus2(3,2,1) 78 4341081 84 682 476 389 165 245907 6.1279 509238 11.7307 141 3874 137147 312741 146572 3929 78 150030 216 48253 131,(122,4,5)

[^] Please note that the ContigN50 calculated by Mauve Assembly Metrics is incorrect (off-by-one error). We have followed the definition of N50 (A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list. ref) to calculate N50s. GAGE's N50 was calculated using the total reference genome length rather than the sum total of contig lengths. The GAGE's cor.N50 values were computed after correcting contigs by breaking them at each error.

[#] Please note that GAA and minimus2 were designed to merge two assemblies at a time, we thus performed all runs and took the average scores.