SPAdes - hybrid

Revision as of 15 August 2014 02:40 by admin (Comments | Contribs)

We did assembly by SPAdes with Dataset 4 raw data first, and then used different subreads depths of Dataset 5 and Dataset 9 to scaffold by SSPACE-longread.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

spades.py -1 reads_1.fastq -2 reads_2.fastq --pacbio filtered_subreads.fasta -o output

Evaluation

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail

Statistics without reference Miseq_only Miseq_1cell Miseq_2cell Miseq_3cell Miseq_4cell Miseq_17cell Miseq_d9
# contigs 86 15 14 12 11 6 6
Largest contig 285889 2497845 1260980 2501081 3194637 1954649 3392211
Total length 4577132 4632009 4633058 4636174 4638657 4633857 4632677
N50 139882 2497845 1238868 2501081 3194637 1238635 3392211
Misassemblies
# misassemblies 2 9 10 10 10 8 8
Misassembled contigs length 215581 3193893 3244631 2705788 3657574 3243566 4050696
Mismatches
# mismatches per 100 kbp 3.02 7.15 6.2 6.9 7.17 6.05 6.43
# indels per 100 kbp 0.46 1.06 1 1.32 1.27 0.95 1.06
# N's per 100 kbp 0 97.89 67.67 77.33 77.37 91.03 123.88
Genome statistics
Genome fraction (%) 98.451 99.498 99.483 99.664 99.748 99.432 99.587
Duplication ratio 1.001 1.002 1.002 1.002 1.002 1.003 1.001
# genes 4399 + 32 part 4467 + 14 part 4465 + 13 part 4476 + 11 part 4477 + 11 part 4467 + 11 part 4470 + 13 part
NGA50 133059 571664 425173 852639 1039467 1039472 1039654