Read Depths

Revision as of 13 August 2013 20:50 by admin (Comments | Contribs) | (→Evaluation)

(diff) ← Previous revision | Current revision | Next revision → (diff)

As described in the paper Hybrid error correction, second-generation data can be used to correct PacBio reads and then perform de novo assembly using PacBio corrected reads (PBcR). Here, we discuss the effects of depths on (1) hybrid error correction and (2) assembly.

Contents [hide]
1 Hybrid error correction 1.1 Short read depth 1.2 Long read depth 1.3 Performance 2 Assembly 3 Evaluation

Hybrid error correction

pacBioToCA -length 500 -partitions 200 -l PacBio_Illumia -s pacbio.spec

The file of pacbio.spec was downloaded from PacBioToCA and corrected to pacbio.spec (下載).

Short read depth

We have used three short read depths (Raw, 118X and 100X) to correct long reads.

Long read depth

We have used subreads of 1-4 SMRT cells for different depths of long reads.

We arbitrary chose 1-4 SMRT cells:
Three single SMRT cell: m120208_071634, m120228_192221, m120228_210845
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630

Performance

Name	m120208_071634	m120228_192221	m120228_210845	Two SMRT cells	Three SMRT cells	Four SMRT cells
	seqs amount:37077	seqs amount:38542	seqs amount:44794	seqs amount:77117	seqs amount:113284	seqs amount:136333
	seq avg len:2023.338161	seq avg len:2322.679985	seq avg len:2334.414140	seq avg len:2184.208709	seq avg len:2333.977711	seq avg len:2386.664674
	total:75.02 Mb	total:89.52 Mb	total:104.57 Mb	total:168.44 Mb	total:264.40 Mb	total:325.38 Mb
	depth: 16.13X	depth: 19.25X	depth: 22.49X	depth: 36.22X	depth: 56.86X	depth: 69.97X
Corrected by raw data
	seqs amount:26492	seqs amount:34981	seqs amount:40666	seqs amount:63760	seqs amount:98165	seqs amount:118901
	seq avg len:2352.489884	seq avg len:2133.783826	seq avg len:2124.597010	seq avg len:2199.845561	seq avg len:2286.482249	seq avg len:2320.548322
	total:62.32 Mb	total:74.64 Mb	total:86.40 Mb	total:140.26 Mb	total:224.45 Mb	total:275.92 Mb
	depth: 13.40X	depth: 16.05X	depth: 18.58X	depth: 30.16X	depth: 48.27X	depth: 59.34X
Corrected by 118X
	seqs amount:26666			seqs amount:64201	seqs amount:99285	seqs amount:120296
	seq avg len:2309.110290			seq avg len:2150.165184	seq avg len:2221.782394	seq avg len:2252.656963
	total:61.57 Mb			total:138.04 Mb	total:220.59 Mb	total:270.99 Mb
	depth: 13.24X			depth: 29.69X	depth: 47.44X	depth: 58.28X
Corrected by 100X
	seqs amount:25618			seqs amount:61415	seqs amount:95240	seqs amount:115080
	seq avg len:2315.355024			seq avg len:2165.060164	seq avg len:2247.193879	seq avg len:2283.976060
	total:59.31 Mb			total:132.97 Mb	total:214.02 Mb	total:262.84 Mb
	depth: 12.76X			depth: 28.60X	depth: 46.03X	depth: 56.52X

Assembly

After read correction, PBcR can be used to de novo assemble the genome using runCA or Mira3.

We have assembled the genome with the all PBcR and the filtered PBcR (25X, using gatekeeper) by runCA.

runCA -p asm -d asm -s asm.spec PacBio_Illumia.frg > asm.out 2>&1

Evaluation

We have evaluated the assemblies with QUAST 2.2(reference genome and genes下載).

Single SMRT cell reads corrected with raw, 100X and 118X short reads.

Statistics without reference	071634_raw_asm.ctg	192221_raw_asm.ctg	210845_raw_asm.ctg	071634_100X_asm.ctg	071634_118X_asm.ctg
# contigs	80	93	83	61	69
Largest contig	745120	664876	562203	663399	434084
Total length	4975695	5031560	5043217	4804004	4805579
N50	356974	221472	324225	295449	179662
Misassemblies
# misassemblies	11	17	21	10	13
Misassembled contigs length	1552524	976207	2108892	1222277	782726
Mismatches
# mismatches per 100 kbp	3.32	2.91	3.06	7.08	6.4
# indels per 100 kbp	2.98	1.38	1.01	13.15	5.2
# N's per 100 kbp	0.38	0.12	0.22	0.4	0.37
Genome statistics
Genome fraction (%)	99.97	100	100	99.304	99.424
Duplication ratio	1.074	1.086	1.090	1.043	1.047
# genes	4489 + 7 part	4490 + 7 part	4495 + 2 part	4461 + 25 part	4451 + 31 part
NGA50	357183	221098	279423	226118	179662