Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : ecoli_data_alt.tar.gz
Fragment library
Reads length : 101bp
Reads amount : 1186190 X2
Insert size : 180bp
Coverage : 46.02X
Jumping library 1
Reads length : 93bp
Reads amount : 1615702 X2
Insert size : 3000bp
Jumping library 2
Reads length : 93bp
Reads amount : 362199 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1514.24bp
Reads amount : 409304
Coverage : 133.58X
The raw data of website data from Sequence Read Archive (SRA)
Fragment library
Accession : SRX131033
Reads length : 101bp
Reads amount : 13457571 X2
Insert size : 180bp
Coverage : 522.1X
Jumping library 1
Accession : SRX117481
Jumping library 2
Accession : SRR492488
PacBio reads
Accession : SRX109917, SRX109901(SRR386913, SRR387092, SRR386907, SRR387035), SRX109936
We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.088\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library of raw data by prepare.sh.
Fraction = 100 / 522.1 = 0.192
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.192\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Basic statistics | Raw Data | Website Data | Self-fraction Data | 100 Coverage | |
# contigs | 14 | 1 | 1 | 1 | |
Largest contig | 4625005 | 4638970 | 4638970 | 4638970 | |
Total length | 4652215 | 4638970 | 4638970 | 4638970 | |
N50 | 4625005 | 4638970 | 4638970 | 4638970 | |
Misassemblies | |||||
# misassemblies | 5 | 1 | 1 | 1 | |
Misassembled contigs length | 4625005 | 4638970 | 4638970 | 4638970 | |
Mismatches | |||||
# mismatches per 100kbp | 1.06 | 0.11 | 0.06 | 0.06 | |
# indels per 100kbp | 0.61 | 0.09 | 0.09 | 0.11 | |
# N's per 100kbp | 282.94 | 0 | 0 | 0 | |
Genome statistics | |||||
Genome fraction (%) | 99.418 | 99.983 | 99.983 | 99.983 | |
Duplication ratio | 1.013 | 1 | 1 | 1 | |
# genes | 4471 + 2 part | 4494 + 1 part | 4494 + 1 part | 3348 + 592 part | 4495 + 0 part |