As described in the paper Hybrid error correction, second-generation data can be used to correct PacBio reads and then perform de novo assembly using PacBio corrected reads (PBcR). Here, we discuss the effects of depths on (1) hybrid error correction and (2) assembly.
Contents |
---|
pacBioToCA -length 500 -partitions 200 -l PacBio_Illumia -s pacbio.spec
The file of pacbio.spec was downloaded from PacBioToCA and corrected to XXX.
We have used three short read depths (Raw, 118X and 100X) to correct long reads.
We have used subreads of 1-4 SMRT cells for different depths of long reads.
We arbitrary chose 1-4 SMRT cells:
Three single SMRT cell: m120208_071634, m120228_192221, m120228_210845
Two SMRT cells: m120228_210845 + m120208_122534
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630
Name | m120208_071634 | m120228_192221 | m120228_210845 | Two SMRT cells | Three SMRT cells | Four SMRT cells |
seqs amount:37077 | seqs amount:38542 | seqs amount:44794 | seqs amount:77117 | seqs amount:113284 | seqs amount:136333 | |
seq avg len:2023.338161 | seq avg len:2322.679985 | seq avg len:2334.414140 | seq avg len:2184.208709 | seq avg len:2333.977711 | seq avg len:2386.664674 | |
total:75.02 Mb | total:89.52 Mb | total:104.57 Mb | total:168.44 Mb | total:264.40 Mb | total:325.38 Mb | |
depth: 16.13X | depth: 19.25X | depth: 22.49X | depth: 36.22X | depth: 56.86X | depth: 69.97X | |
Corrected by raw data | ||||||
seqs amount:26492 | seqs amount:34981 | seqs amount:40666 | seqs amount:63760 | seqs amount:98165 | seqs amount:118901 | |
seq avg len:2352.489884 | seq avg len:2133.783826 | seq avg len:2124.597010 | seq avg len:2199.845561 | seq avg len:2286.482249 | seq avg len:2320.548322 | |
total:62.32 Mb | total:74.64 Mb | total:86.40 Mb | total:140.26 Mb | total:224.45 Mb | total:275.92 Mb | |
depth: 13.40X | depth: 16.05X | depth: 18.58X | depth: 30.16X | depth: 48.27X | depth: 59.34X | |
Corrected by 118X | ||||||
seqs amount:26666 | seqs amount:64201 | seqs amount:99285 | seqs amount:120296 | |||
seq avg len:2309.110290 | seq avg len:2150.165184 | seq avg len:2221.782394 | seq avg len:2252.656963 | |||
total:61.57 Mb | total:138.04 Mb | total:220.59 Mb | total:270.99 Mb | |||
depth: 13.24X | depth: 29.69X | depth: 47.44X | depth: 58.28X | |||
Corrected by 100X | ||||||
seqs amount:25618 | seqs amount:61415 | seqs amount:95240 | seqs amount:115080 | |||
seq avg len:2315.355024 | seq avg len:2165.060164 | seq avg len:2247.193879 | seq avg len:2283.976060 | |||
total:59.31 Mb | total:132.97 Mb | total:214.02 Mb | total:262.84 Mb | |||
depth: 12.76X | depth: 28.60X | depth: 46.03X | depth: 56.52X | |||
After read correction, PBcR can be used to de novo assemble the genome using runCA or Mira3.
We have assembled the genome with the all PBcR and the filtered PBcR (25X, using gatekeeper) by runCA, and evaluated the assemblies with QUAST 2.2.