Dataset2

Revision as of 21 August 2013 20:31 by admin (Comments | Contribs) | (Assemblies with short reads only)
Contents

Data

Short reads

Paired reads are available at Illumina Miseq (Mate1, Mate2)

Read length: 151bp

Read amount: 5,729,470 X2

Insert size ~ 300bp

We combined MiSeq_Ecoli_MG1655_110721_PF_R1.fastq and MiSeq_Ecoli_MG1655_110721_PF_R2.fastq to MiSeq_PE.fastq.

java convertFastqToFastaAndQual MiSeq_PE.fastq MiSeq_PE.fna MiSeq_PE.qual
convert-fasta-to-v2.pl -mean 297 -stddev 35 -m Mate_info -l Illumia_Ecoli -s MiSeq_PE.fna -q MiSeq_PE.qual > MiSeq_PE.frg

The data in frg format were downloaded from Miseq100X.frg

We have trimmed the sequence reads to be of error probability less than 0.05. The paired-end reads were discarded if one read is shorter than 150bp.

We therefore obtained 1,839,935 paired-end reads (~118X, tMiSeq_PE.frg) with high quality for further analysis.

Long reads

17 SMRT Cells for E. coli MG1655 were downloaded (details in Data and Read Depths).

Assemblies with short reads only

Abyss, Edena, SPAdes, SOAPdenovo2, Velvet, CISA (Note at: 20130718_MiSeq_with_verious_Assemblers) MaSuRCA (note at: MaSuRCA for MiSeq Data)

PacBio corrected reads (PBcR)

To correct the PacBio CLR with raw short reads:

pacBioToCA -length 1000 -partitions 200 -l PacBio_Illumia -s pacbio.spec -fastq PacBio_10kb_CLR.fastq Ecoli_MG1655_PE.frg 1>PacBio_Illumia_Ecoli.out 2>error.out

To correct the PacBio CLR with 100X high-quality reads (p<0.05, length of paired-end read>=100bp)

pacBioToCA -length 500 -partitions 200 -l PacBio_Illumia -s pacbio.spec -fastq PacBio_10kb_CLR.fastq 100X_Ecoli_PE.frg

Assemblies with hybrid methods

Assemble corrected long reads

runCA (20130807_MG1655_s1s2_with_verious_Assemblers for 100X, 20120628_20120628_PacBio_With_CA(Celera Assembler)_Wgs), MIRA3

Hybrid assemble from pre-assembled contigs and long reads

AHA, PBJelly, Cerulean, Patch