H. volc

Revision as of 14 September 2012 21:24 by admin (Comments | Contribs) | (Evaluation)

Haloferax volcanii DS2

Contigs source

Three assemblies are available from How to score genome assemblies using the Mauve system.

Sequence assembly

Assembly Description
volc454 It was sequenced using 454 pyrosequencing by Roach Inc on a GS FLX Titanium instrument. 25x coverage of reads were obtained. Reads were assembled to contigs with Newbler by Roache.
volcV It was sequenced to 25x coverage using Illumina 100 nt read pairs with 500 nt inserts, and 15x coverage of 50 nt Illumina mate-pairs with 6.5 kbp insert. Both data type were generated by BGI. The assembly was constructed with velvet using the above ginve insert size estimates and default parameters. No read error ecoorection or quality trimming steps were performed.
volcIDBA It was sequenced with 80x coverage 76 nt read pairs with 300 nt inserts on an Illumina GAIIx instrument at UC Davis Genome Center, and 2x coverage of 50 nt mate-pairs with 6.5 kbp insert sequences at BGI. The reads were error corrected with REPTILE using default parameters, contigs assembled with IDBA using the custome parameters --mink 33 --maxk 78 and evertything else default, and scaffolded with SSPACE using the custom parameter -a 0.5 and everything else default.
  • Scored with Mauve metrics:
Name NumContigs NumAssemblyBases NumMisCalled NumUnCalled NumGapsRef NumGapsAssembly TotalBasesMissed %Missed ExtraBases %Extra BrokenCDS IntactCDS ContigN50 ContigN90 MaxContigLength
volc454 157 3920004 90 0 141 128 119928 2.9886 1818 0.0464 30 3908 123582 11735 217295
volcV 1394 4394403 925 30431 1848 1739 161388 4.0217 503214 11.4512 505 3433 843300 57 1354539
volcIDBA 367 3880100 1209 5884 999 991 155857 3.8839 22465 0.579 442 3496 19349 5537 99636

Contig integrator

All Contigs

Since minimus2 can only merge two assemblies at a time, we iteratively applied it to integrate more assemblies. We have thoroughly test all combinations for minimus2 in the case of H. volc because only three assemblies were available.

The name of file with 'rawctg.fa' is raw contig from Mauve
The name with '.ctg.fa' is the splited contig by contiguous 'N'.

The split references for MAIA and the integrated results can be downloaded hvolc_maia.

Evaluation

  • Benchmark genome
Haloferax_volcanii_DS2.gbk
or
NCBI
  • Evaluated by Mauve Assembly Metrics
How to score genome assemblies using the Mauve system
  • Evaluated by Blast with Features
  • Evaluated by Gage
Gage
  • Score with Mauve metrics:
Name NumContigs NumAssemblyBases DCJ_Distance NumMisCalled NumUnCalled NumGapsRef NumGapsAssembly TotalBasesMissed %Missed ExtraBases %Extra BrokenCDS IntactCDS ContigN50 ContigN90 MaxContigLength Blast_IntactCDS Units(>200) N50 cor.Units cor.N50 Errors,(Indel>=5,Inv,Rel)
Hvolc.454 157 3920004 121 90 0 141 128 119928 2.9886 1818 0.0464 30 3908 123582 11735 217295 3953 145 123582 137 121280 8,(3,0,5)
Hvolc.V 1555 3855484 1680 762 1527 1541 1542 197907 4.9318 16765 0.4348 450 3488 9037 1862 55518 3144 997 8440 1302 5773 201,(154,1,46)
Hvolc.IDBA 580 3871717 611 988 548 1080 989 162341 4.0455 20132 0.52 438 3500 12787 3473 53121 3411 580 12333 1100 6229 499,(479,9,11)
CISA 72 4041410 76 187 39 140 129 199738 4.9774 119648 2.9606 42 3896 107315 23789 222317 3908 69 127585 107 85026 37,(30,3,4)
GAA# 179 4168210 196 606 907 544 328 216189 5.387333333 355158 8.052566667 259 3680 103468 14726 224742 3855 179 113978 413 43800 213,(200,4,8)
MAIA (split6) 6 4344441 9 377 105513 550 554 863495 21.518 1145809 26.3741 384 3554 672888 7694 1667164 3024 6 1460314 547 9097 646,(610,4,32)
MAIA (split6&n) 893 3619301 877 468 391 872 815 975697 24.314 523891 14.4749 340 3598 16556 2617 265643 2946 649 14108 691 7337 59,(56,0,3)
minimus2# 800 3931644 809 710 819 976 916 166957 4.16052 60722 1.533 331 3607 37942 5045 102493 3593 495 53558 762 46217 237,(213,3,21)
minimus2(1,2,3) 65 4087988 73 530 1041 496 153 95469 2.3791 155563 3.8054 253 3685 171050 22079 342018 3977 65 182445 284 29652 213,(205,2,6)
minimus2(1,3,2) 71 4178001 80 551 964 531 170 199565 4.9731 337543 8.0791 258 3680 169924 21429 341963 3956 71 171745 293 29632 216,(205,6,5)
minimus2(2,1,3) 65 4089672 74 455 1061 505 157 97620 2.4327 171785 4.2005 259 3679 171050 21825 342018 3974 65 182445 290 28788 218,(210,2,6)
minimus2(2,3,1) 75 4296848 77 387 483 370 137 250571 6.2441 468713 10.9083 142 3796 146572 27186 312727 3915 75 150030 204 47330 122,(114,4,4)
minimus2(3,1,2) 71 4178049 82 574 927 533 175 198653 4.9504 337599 8.0803 262 3676 169924 21429 342039 3955 71 171745 294 29652 217,(206,6,5)
minimus2(3,2,1) 78 4341081 84 762 493 390 165 244751 6.0991 495987 11.4254 145 3793 137147 24300 312741 3929 78 150030 216 48253 131,(122,4,5)

[#] Please note that GAA and minimus2 were designed to merge two assemblies at a time, we thus performed all runs and took the average scores.