Home

Revision as of 29 April 2014 21:32 by admin (Comments | Contribs)


Completing microbial genome assemblies: strategy and performance comparisons

Determining the genomic sequences of microorganisms is the basis and prerequisite for understanding their biology and functional characterization. While the advent of low-cost, extremely high-throughput second-generation sequencing technologies and the parallel development of assembly algorithms have generated rapid and cost-effective genome assemblies, the assemblies are often unfinished, fragmented draft genomes as a result of the short read lengths and long repeats present in multiple copies. Several methods, such as ALLPATH-LG, hybrid and non-hybrid approaches, have been proposed to utilize the third-generation sequencing long reads that can span many thousands of bases for complete microbial genome assemblies. However, there appears an insufficiency on standardized procedure for strategy comparison and evaluation on their assemblies.

In this article, we provide a comprehensive review of the above-motioned methods, and collect datasets for comparative assessment of the non-hybrid approaches—hierarchical genome-assembly process (HGAP) and self-correction approach (SCA). In addition to offering explicit and useful recommendations to practitioners, the review guides to design a project in finishing microbial genome assembly. Following a special recipe proposed by ALLPATHS-LG, to supply it with the three prepared libraries—fragment, jump and long reads, ALLPATHS-LG is able to complete microbial genomes as the sequencing coverage is controlled at 100X. Although the hybrid approach could improve the continuity over the assembly produced by the next-generation sequencing reads along, we did not successfully assemble a complete genome. The both non-hybrid approaches—HGAP and SCA—are able to produce complete genomes as long as the third generation sequencing reads are adequately long and sufficient.