We did assembly by SPAdes with Dataset 4 raw data first, and then used different subreads depths of Dataset 5 and Dataset 9 to scaffold by SSPACE-longread.
We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630
spades.py -1 reads_1.fastq -2 reads_2.fastq -o output SSPACE-LongRead.pl -c contig.fasta -p filter_subreads.fasta -b output
We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail
Statistics without reference | Miseq_only_sspace | Miseq_1cell_sspace | Miseq_2cell_sspace | Miseq_3cell_sspace | Miseq_4cell_sspace | Miseq_17cell_sspace | Miseq_d9_sspace |
# contigs | 86 | 15 | 18 | 16 | 15 | 17 | 14 |
Largest contig | 285889 | 2497845 | 1260980 | 2501081 | 3194637 | 1954649 | 3392211 |
Total length | 4577132 | 4632009 | 4633058 | 4636174 | 4638657 | 4633857 | 4632677 |
N50 | 139882 | 2497845 | 1238868 | 2501081 | 3194637 | 1238635 | 3392211 |
Misassemblies | |||||||
# misassemblies | 2 | 9 | 10 | 10 | 10 | 8 | 8 |
Misassembled contigs length | 215581 | 3193893 | 3244631 | 2705788 | 3657574 | 3243566 | 4050696 |
Mismatches | |||||||
# mismatches per 100 kbp | 3.02 | 7.15 | 6.2 | 6.9 | 7.17 | 6.05 | 6.43 |
# indels per 100 kbp | 0.46 | 1.06 | 1 | 1.32 | 1.27 | 0.95 | 1.06 |
# N's per 100 kbp | 0 | 97.89 | 67.67 | 77.33 | 77.37 | 91.03 | 123.88 |
Genome statistics | |||||||
Genome fraction (%) | 98.451 | 99.498 | 99.483 | 99.664 | 99.748 | 99.432 | 99.587 |
Duplication ratio | 1.001 | 1.002 | 1.002 | 1.002 | 1.002 | 1.003 | 1.001 |
# genes | 4399 + 32 part | 4467 + 14 part | 4465 + 13 part | 4476 + 11 part | 4477 + 11 part | 4467 + 11 part | 4470 + 13 part |
NGA50 | 133059 | 571664 | 425173 | 852639 | 1039467 | 1039472 | 1039654 |
Running Time | 2hr 28m | 2hr 32m | 2hr 37m | 2hr 43m | 2hr 47m | 3hr 43m | 3hr 11m |
Misassemblies for Adobe reader.