SPAdes - hybrid

Revision as of 15 August 2014 02:47 by admin (Comments | Contribs)

We did assembly by SPAdes with Dataset 4 raw data first, and then used different subreads depths of Dataset 5 and Dataset 9 to scaffold by SSPACE-longread.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

spades.py -1 short_1.fastq -2 short_2.fastq --pacbio filtered_subreads.fasta -o output

Evaluation

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail

Statistics without reference Miseq_only Miseq_1cell Miseq_2cell Miseq_3cell Miseq_4cell Miseq_17cell Miseq_d9
# contigs 86 15 14 12 11 6 6
Largest contig 285889 2498709 3196491 1522255 2216997 4644452 4644455
Total length 4577132 4626443 4639784 4651072 4649795 4652420 4652423
N50 139882 2498709 3196491 1241619 1750947 4644452 4644455
Misassemblies
# misassemblies 2 6 6 7 7 7 7
Misassembled contigs length 215581 2498709 3196491 2501706 3967944 4644452 4644455
Mismatches
# mismatches per 100 kbp 3.02 5.16 6.16 6.79 6.81 6.79 6.79
# indels per 100 kbp 0.46 0.54 0.54 0.58 0.58 0.67 0.58
# N's per 100 kbp 0 0 0 0 0 0 0
Genome statistics
Genome fraction (%) 98.451 99.411 99.756 100 99.976 100 100
Duplication ratio 1.001 1.001 1.001 1.001 1.001 1.001 1.001
# genes 4399 + 32 part 4479 + 3 part 4485 + 4 part 4492 + 5 part 4492 + 5 part 4495 + 2 part 4495 + 2 part
NGA50 133059 692561 1041186 889325 1278976 3026419 3026422