SPAdes - hybrid

We used different numbers of SMRT cells from Dataset 5 and Dataset 9 to do assembly with all and randomly selected 76X coverage Dataset 4 short reads by SPAdes.

We arbitrary chose 1-4 SMRT cells:
One single SMRT cell: m120208_071634
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

spades.py -1 reads_1.fastq -2 reads_2.fastq --pacbio filtered_subreads.fasta -o output

Evaluation

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). more detail

Statistics without reference	Miseq_only	Miseq_1cell	Miseq_2cell	Miseq_3cell	Miseq_4cell	Miseq_17cell	Miseq_d9
# contigs	86	15	14	12	11	6	6
Largest contig	285889	2498709	3196491	1522255	2216997	4644452	4644455
Total length	4577132	4626443	4639784	4651072	4649795	4652420	4652423
N50	139882	2498709	3196491	1241619	1750947	4644452	4644455
Misassemblies
# misassemblies	2	6	6	7	7	7	7
Misassembled contigs length	215581	2498709	3196491	2501706	3967944	4644452	4644455
Mismatches
# mismatches per 100 kbp	3.02	5.16	6.16	6.79	6.81	6.79	6.79
# indels per 100 kbp	0.46	0.54	0.54	0.58	0.58	0.67	0.58
# N's per 100 kbp	0	0	0	0	0	0	0
Genome statistics
Genome fraction (%)	98.451	99.411	99.756	100	99.976	100	100
Duplication ratio	1.001	1.001	1.001	1.001	1.001	1.001	1.001
# genes	4399 + 32 part	4479 + 3 part	4485 + 4 part	4492 + 5 part	4492 + 5 part	4495 + 2 part	4495 + 2 part
NGA50	133059	692561	1041186	889325	1278976	3026419	3026422
Running Time	2hr 28m	2hr 34m	2hr 36m	2hr 40m	2hr 35m	2hr 46m	2hr 33m

Misassemblies for Adobe reader.

Randomly selected 76X coverage short reads

We have evaluated the assemblies with QUAST 2.3(reference genome NC_000913 and Ec_gene_list). (more detail)

Statistics without reference	Miseq_only	Miseq_1cell	Miseq_2cell	Miseq_3cell	Miseq_4cell	Miseq_17cell	Miseq_d9
# contigs	86	23	18	19	15	15	15
Largest contig	285889	2498544	3195523	2501177	3893601	3893600	3893600
Total length	4577132	4624001	4630238	4637837	4635269	4635268	4635268
N50	139882	2498544	3195523	2501177	3893601	3893600	3893600
Misassemblies
# misassemblies	2	6	6	6	6	6	6
Misassembled contigs length	215581	2498544	3195523	2501177	3893601	3893600	3893600
Mismatches
# mismatches per 100 kbp	3.02	8.83	9.25	9.62	9.45	9.45	9.47
# indels per 100 kbp	0.46	0.48	0.48	0.43	0.48	0.5	0.5
# N's per 100 kbp	0	0	0	0	0	0	0
Genome statistics
Genome fraction (%)	98.451	99.382	99.531	99.696	99.645	99.645	99.645
Duplication ratio	1.001	1.001	1.001	1.001	1.001	1.001	1.001
# genes	4399 + 32 part	4479 + 2 part	4474 + 3 part	4475 + 2 part	4482 + 2 part	4479 + 2 part	4479 + 2 part
NGA50	133059	691983	1040999	693363	1040814	1040814	1040814
Running Time	2hr 28m	52m 30s	49m 54s	54m 55s	57m 49s	1hr 20m	1hr 4m