SSPACE

We did assembly by SPAdes with Dataset 4 raw data first, and then used different subreads depths of Dataset 5 and Dataset 9 to scaffold by SSPACE-longread.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

spades.py -1 reads_1.fastq -2 reads_2.fastq -o output
SSPACE-LongRead.pl -c contig.fasta -p filter_subreads.fasta -b output

Evaluation

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail

Statistics without reference Miseq_only_sspace Miseq_1cell_sspace Miseq_2cell_sspace Miseq_3cell_sspace Miseq_4cell_sspace Miseq_17cell_sspace Miseq_d9_sspace
# contigs 86 15 18 16 15 17 14
Largest contig 285889 2497845 1260980 2501081 3194637 1954649 3392211
Total length 4577132 4632009 4633058 4636174 4638657 4633857 4632677
N50 139882 2497845 1238868 2501081 3194637 1238635 3392211
Misassemblies
# misassemblies 2 9 10 10 10 8 8
Misassembled contigs length 215581 3193893 3244631 2705788 3657574 3243566 4050696
Mismatches
# mismatches per 100 kbp 3.02 7.15 6.2 6.9 7.17 6.05 6.43
# indels per 100 kbp 0.46 1.06 1 1.32 1.27 0.95 1.06
# N's per 100 kbp 0 97.89 67.67 77.33 77.37 91.03 123.88
Genome statistics
Genome fraction (%) 98.451 99.498 99.483 99.664 99.748 99.432 99.587
Duplication ratio 1.001 1.002 1.002 1.002 1.002 1.003 1.001
# genes 4399 + 32 part 4467 + 14 part 4465 + 13 part 4476 + 11 part 4477 + 11 part 4467 + 11 part 4470 + 13 part
NGA50 133059 571664 425173 852639 1039467 1039472 1039654
Running Time 2hr 28m 2hr 32m 2hr 37m 2hr 43m 2hr 47m 3hr 43m 3hr 11m

Misassemblies for Adobe reader.