Data

The sequence data of E. coli K12 MG1655 were available at http://www.cbcb.umd.edu/software/PBcR/closure/index.html. Please refer to Reducing assembly complexity of microbial genomes with single-molecule sequencin, Genome Biology 2013 for detail.

Contents

Short Reads (Dataset 4, E. coli K-12 MG1655)

Raw

Paired reads are available at Illumina Miseq (Mate1, Mate2)

Read length: 151bp

Read amount: 5,729,470 X2

Insert size : ~ 300bp

Coverage : ~373X

100X

The data in frg format were downloaded from Miseq100X

118X (HQ)

We have trimmed the sequence reads to be of error probability less than 0.05. The paired-end reads were discarded if one read is shorter than 149bp.

We therefore obtained 1,839,935 paired-end reads (~118X) with high quality for further analysis.


Long Reads (Dataset 5, E. coli K-12 MG1655)

Raw

Although the PacBio sequence reads are available at SRA, we can not handle adapters correctly by using fastq-dump. We therefore requested for the h5 files from NCBI help desk. Files are listed below:

m120208_071634_42139_c100288480630000001523009507231245_s1_p0.bas.h5 (1.2GB)
m120208_122534_42139_c100290260310000001523009507231262_s1_p0.bas.h5 (1010MB)
m120208_160812_42139_c100290260310000001523009507231264_s1_p0.bas.h5 (733MB)
m120228_082105_42139_c100301722550000001523012308061200_s1_p0.bas.h5 (1.2GB)
m120228_100807_42139_c100301722550000001523012308061201_s1_p0.bas.h5 (1.0GB)
m120228_115504_42139_c100301722550000001523012308061202_s1_p0.bas.h5 (1.0GB)
m120228_134222_42139_c100301722550000001523012308061203_s1_p0.bas.h5 (985MB)
m120228_152936_42139_c100301722550000001523012308061204_s1_p0.bas.h5 (1.1GB)
m120228_171636_42139_c100301722550000001523012308061205_s1_p0.bas.h5 (1.1GB)
m120228_190630_42139_c100301722550000001523012308061206_s1_p0.bas.h5 (984MB)
m120228_192221_42129_c100298890010000001523009207231260_s1_p0.bas.h5 (1.1GB)
m120228_205404_42139_c100301722550000001523012308061207_s1_p0.bas.h5 (879MB)
m120228_210845_42129_c000304152550000001500000112311370_s1_p0.bas.h5 (1.2GB)
m120228_223624_richard_c001202352550000001500000112311330_s1_p0.bas.h5 (833MB)
m120229_004752_42129_c000304192550000001500000112311350_s1_p0.bas.h5 (936MB)
m120229_012852_42139_c000301732550000001500000112311360_s1_p0.bas.h5 (1.0GB)
m120229_193409_42129_c000304212550000001500000112311380_s1_p0.bas.h5 (1000MB)

Subreads

We have run smrtpipe.py (SMRT analysis) with the following params.xml to get filtered subreads of continuous long reads (CLR).

 <param name="minLength">
   <value>50</value>
 </param>
 <param name="readScore">
   <value>0.75</value>
 </param>
 <param name="minSubReadLength">
   <value>50</value>

200X

We have downloaded the 200X filtered fastq sequences from PB200X