Line 1: |
|
|
+ |
= Data =
|
|
|
+ |
== Short reads ==
|
|
|
+ |
Paired reads are available at [http://www.illumina.com/systems/miseq/scientific_data.ilmn Illumina Miseq] ([ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/MG1655/MiSeq_Ecoli_MG1655_110721_PF_R1.fastq.gz Mate1], [ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/MG1655/MiSeq_Ecoli_MG1655_110721_PF_R2.fastq.gz Mate2])
|
|
|
|
|
|
|
+ |
Read length: 151bp
|
|
|
|
|
|
|
+ |
Read amount: 5,729,470 X2
|
|
|
|
|
|
|
+ |
Insert size ~ 300bp
|
|
|
|
|
|
|
+ |
We converted the required format using Picard tools ([http://picard.sourceforge.net/command-line-overview.shtml ref]).
|
|
|
|
|
|
|
+ |
java -jar SamToFastq.jar INPUT=Ecoli_MG1655_s_6_1_bfast.bam FASTQ=Ecoli_MG1655_s1.fastq
|
|
|
+ |
java -jar SamToFastq.jar INPUT=Ecoli_MG1655_s_6_2_bfast.bam FASTQ=Ecoli_MG1655_s2.fastq
|
|
|
|
|
|
|
+ |
We combined MiSeq_Ecoli_MG1655_110721_PF_R1.fastq and MiSeq_Ecoli_MG1655_110721_PF_R2.fastq to MiSeq_PE.fastq.
|
|
|
|
|
|
|
+ |
java convertFastqToFastaAndQual MiSeq_PE.fastq MiSeq_PE.fna MiSeq_PE.qual
|
|
|
+ |
convert-fasta-to-v2.pl -mean 214 -stddev 21 -m Mate_info -l Illumia_Ecoli -s Ecoli_MG1655_PE.fna -q Ecoli_MG1655_PE.qual > Ecoli_MG1655_PE.frg
|
|
|
+ |
== Long reads ==
|
|
|
+ |
1 SMRT Cell of 10 kbp continuous long reads (CLR) for Escherichia coli K12 MG1655 were downloaded from [https://github.com/PacificBiosciences/DevNet/wiki/E%20coli%20K12%20MG1655%20Hybrid%20Assembly this link].
|
|
|
|
|
|
|
+ |
The file of PacBio_10kb_CLR.fastq contains ~21X of E. coli CLR reads from a 10kb library that was filtered using standard PacBio filtering thresholds (minimum RQ=0.75, RL=50bp) ([http://files.pacb.com/datasets/secondary-analysis/e-coli-k12-de-novo/1.3.0/README.txt ref]).
|
|
|
+ |
= Assemblies with short reads only =
|
|
|
+ |
Abyss, Edena, SPAdes, SOAPdenovo2, Velvet, CISA (Note at: 20130807_MG1655_s1s2_with_verious_Assemblers)
|
|
|
+ |
MaSuRCA (note at: MaSuRCA assembler)
|
|
|
|
|
|
|
+ |
= PacBio corrected reads (PBcR) =
|
|
|
+ |
To correct the PacBio CLR with raw short reads:
|
|
|
+ |
pacBioToCA -length 1000 -partitions 200 -l PacBio_Illumia -s pacbio.spec -fastq PacBio_10kb_CLR.fastq Ecoli_MG1655_PE.frg 1>PacBio_Illumia_Ecoli.out 2>error.out
|
|
|
|
|
|
|
+ |
To correct the PacBio CLR with 100X high-quality reads (p<0.05, length of paired-end read>=100bp)
|
|
|
+ |
pacBioToCA -length 500 -partitions 200 -l PacBio_Illumia -s pacbio.spec -fastq PacBio_10kb_CLR.fastq 100X_Ecoli_PE.frg
|
|
|
+ |
= Assemblies with hybrid methods =
|
|
|
|
|
|
|
+ |
== Assemble corrected long reads ==
|
|
|
+ |
runCA (20130807_MG1655_s1s2_with_verious_Assemblers for 100X, 20120628_20120628_PacBio_With_CA(Celera Assembler)_Wgs), MIRA3
|
|
|
|
|
|
|
+ |
== Hybrid assemble from pre-assembled contigs and long reads ==
|
|
|
|
|
|
|
+ |
AHA, PBJelly, Cerulean, Patch
|