SPAdes - D1 and D5

Revision as of 16 August 2014 02:14 by admin (Comments | Contribs) | (Evaluation)

We used Dataset 1, Escherichia coli K-12 MG1655 and with single SMRT cell long reads of Dataset 5 (m120208_071634)

spades.py --pe1-1 frag_1.fastq --pe1-2 frag_2.fastq --mp1-1 jump1_1.fastq --mp1-2 jump1_2.fastq --mp2-1 jump2_1.fastq --mp2-2 jump2_2.fastq --pacbio long.fasta -o output
spades.py --pe1-1 frag_1.fastq --pe1-2 frag_2.fastq --mp1-1 jump1_1.fastq --mp1-2 jump1_2.fastq --mp2-1 jump2_1.fastq --mp2-2 jump2_2.fastq -o output

Evaluation

  • Benchmark genome
E. coli MG1655
  • Evaluated by QUAST
QUAST (QUAST v2.3)
Running QUAST requires Ec_gene_list and NC_000913.fna. There are 4497 genes in total.
  • Score with QUAST: With PacBio Long Reads [ more detail]
Basic statistics Website data Raw Data Raw Data with D5 single SMRT cell
# contigs 16 28 23
Largest contig 1090659 1749262 3014973
Total length 4611695 4630525 4634373
N50 692096 1092719 3014973
Misassemblies
# misassemblies 5 3 1
Misassembled contigs length 1844546 2841981 3014973
Mismatches
# mismatches per 100kbp 9.36 8.62 8.7
# indels per 100kbp 0.76 0.85 0.63
# N's per 100kbp 92.46 31.31 7.94
Genome statistics
Genome fraction (%) 98.951 99.459 99.605
Duplication ratio 1.004 1.001 1.001
# genes 4459 + 13 part 4470 + 13 part 4479 + 3 part
NGA50 657193 694449 2497974
Running Time 27m 1 hr 39m 1hr 20m

Misassemblies for Adobe reader.


  • Score with QUAST: Without PacBio Long Reads [ more detail]
Basic statistics Website data Raw Data
# contigs 31 40
Largest contig 1087924 1190496
Total length 4604959 4612084
N50 555967 693826
Misassemblies
# misassemblies 5 3
Misassembled contigs length 1294499 1190496
Mismatches
# mismatches per 100kbp 8.26 5.27
# indels per 100kbp 0.74 0.5
# N's per 100kbp 161.91 136.01
Genome statistics
Genome fraction (%) 98.951 98.939
Duplication ratio 1.004 1.002
# genes 4427 + 25 part  4430 + 21 part
NGA50 368287 651327
Running Time 27m 1hr 33m

Misassemblies for Adobe reader.