We used different numbers of SMRT cells from Dataset 5 and Dataset 9 to do assembly with all and randomly selected 76X coverage Dataset 4 short reads by SPAdes.
We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630
spades.py -1 reads_1.fastq -2 reads_2.fastq --pacbio filtered_subreads.fasta -o output
We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail
Statistics without reference | Miseq_only | Miseq_1cell | Miseq_2cell | Miseq_3cell | Miseq_4cell | Miseq_17cell | Miseq_d9 |
# contigs | 86 | 15 | 14 | 12 | 11 | 6 | 6 |
Largest contig | 285889 | 2498709 | 3196491 | 1522255 | 2216997 | 4644452 | 4644455 |
Total length | 4577132 | 4626443 | 4639784 | 4651072 | 4649795 | 4652420 | 4652423 |
N50 | 139882 | 2498709 | 3196491 | 1241619 | 1750947 | 4644452 | 4644455 |
Misassemblies | |||||||
# misassemblies | 2 | 6 | 6 | 7 | 7 | 7 | 7 |
Misassembled contigs length | 215581 | 2498709 | 3196491 | 2501706 | 3967944 | 4644452 | 4644455 |
Mismatches | |||||||
# mismatches per 100 kbp | 3.02 | 5.16 | 6.16 | 6.79 | 6.81 | 6.79 | 6.79 |
# indels per 100 kbp | 0.46 | 0.54 | 0.54 | 0.58 | 0.58 | 0.67 | 0.58 |
# N's per 100 kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome statistics | |||||||
Genome fraction (%) | 98.451 | 99.411 | 99.756 | 100 | 99.976 | 100 | 100 |
Duplication ratio | 1.001 | 1.001 | 1.001 | 1.001 | 1.001 | 1.001 | 1.001 |
# genes | 4399 + 32 part | 4479 + 3 part | 4485 + 4 part | 4492 + 5 part | 4492 + 5 part | 4495 + 2 part | 4495 + 2 part |
NGA50 | 133059 | 692561 | 1041186 | 889325 | 1278976 | 3026419 | 3026422 |
Running Time | 2hr 28m | 2hr 34m | 2hr 36m | 2hr 40m | 2hr 35m | 2hr 46m | 2hr 33m |
Misassemblies for Adobe reader.
Randomly selected 76X coverage short reads
We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). (more detail)
Statistics without reference | Miseq_only | Miseq_1cell | Miseq_2cell | Miseq_3cell | Miseq_4cell | Miseq_17cell | Miseq_d9 |
# contigs | 86 | 23 | 18 | 19 | 15 | 15 | 15 |
Largest contig | 285889 | 2498544 | 3195523 | 2501177 | 3893601 | 3893600 | 3893600 |
Total length | 4577132 | 4624001 | 4630238 | 4637837 | 4635269 | 4635268 | 4635268 |
N50 | 139882 | 2498544 | 3195523 | 2501177 | 3893601 | 3893600 | 3893600 |
Misassemblies | |||||||
# misassemblies | 2 | 6 | 6 | 6 | 6 | 6 | 6 |
Misassembled contigs length | 215581 | 2498544 | 3195523 | 2501177 | 3893601 | 3893600 | 3893600 |
Mismatches | |||||||
# mismatches per 100 kbp | 3.02 | 8.83 | 9.25 | 9.62 | 9.45 | 9.45 | 9.47 |
# indels per 100 kbp | 0.46 | 0.48 | 0.48 | 0.43 | 0.48 | 0.5 | 0.5 |
# N's per 100 kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome statistics | |||||||
Genome fraction (%) | 98.451 | 99.382 | 99.531 | 99.696 | 99.645 | 99.645 | 99.645 |
Duplication ratio | 1.001 | 1.001 | 1.001 | 1.001 | 1.001 | 1.001 | 1.001 |
# genes | 4399 + 32 part | 4479 + 2 part | 4474 + 3 part | 4475 + 2 part | 4482 + 2 part | 4479 + 2 part | 4479 + 2 part |
NGA50 | 133059 | 691983 | 1040999 | 693363 | 1040814 | 1040814 | 1040814 |
Running Time | 2hr 28m | 52m 30s | 49m 54s | 54m 55s | 57m 49s | 1hr 20m | 1hr 4m |