Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : ecoli_data_alt.tar.gz
Fragment library
Reads length : 101bp
Reads amount : 1186190 X2
Insert size : 180bp
Coverage : 46.02X
Jumping library 1
Reads length : 93bp
Reads amount : 1615702 X2
Insert size : 3000bp
Jumping library 2
Reads length : 93bp
Reads amount : 362199 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1514.24bp
Reads amount : 409304
Coverage : 133.58X
The raw data of website data from Sequence Read Archive (SRA)
Fragment library
Accession : SRX131033
Reads length : 101bp
Reads amount : 13457571 X2
Insert size : 180bp
Coverage : 522.1X
Jumping library 1
Accession : SRX117481
Jumping library 2
Accession : SRR492488
PacBio reads
Accession : SRX109917, SRX109901(SRR386913, SRR387092, SRR386907, SRR387035), SRX109936
We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.088\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 50X coverage data from fragment library and jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=4640000\ FRAG_COVERAGE=50\ JUMP_COVERAGE=50\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library and jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=4640000\ FRAG_COVERAGE=100\ JUMP_COVERAGE=100\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Basic statistics | Published Data | Raw Data | Fractional Data | 50X coverage | 100X coverage |
# contigs | 1 | 14 | 1 | 1 | |
Largest contig | 4638970 | 4625005 | 4638970 | 4638970 | |
Total length | 4638970 | 4652215 | 4638970 | 4638970 | |
N50 | 4638970 | 4625005 | 4638970 | 4638970 | |
Misassemblies | |||||
# misassemblies | 1 | 5 | 1 | 1 | |
Misassembled contigs length | 4638970 | 4625005 | 4638970 | 4638970 | |
Mismatches | |||||
# mismatches per 100kbp | 0.11 | 1.06 | 0.06 | 0.06 | |
# indels per 100kbp | 0.09 | 0.61 | 0.09 | 0.11 | |
# N's per 100kbp | 0 | 282.94 | 0 | 0 | |
Genome statistics | |||||
Genome fraction (%) | 99.983 | 99.418 | 99.983 | 99.983 | |
Duplication ratio | 1 | 1.013 | 1 | 1 | |
# genes | 4494 + 1 part | 4471 + 2 part | 4494 + 1 part | 4495 + 0 part | |
NGA50 | 4638970 | 2714032 | 3763133 | 4032768 |
Basic statistics | Published Data | Raw Data | Fractional Data | 50X coverage | 100X coverage |
# contigs | 2 | 1 | 5 | 3 | |
Largest contig | 4631220 | 4633080 | 4575759 | 4560636 | |
Total length | 4633146 | 4633080 | 4698903 | 4713335 | |
N50 | 4631220 | 4633080 | 4575759 | 4590636 | |
Misassemblies | |||||
# misassemblies | 3 | 7 | 8 | 8 | |
Misassembled contigs length | 4631220 | 4633080 | 4577746 | 4711603 | |
Mismatches | |||||
# mismatches per 100kbp | 1.19 | 1.42 | 2.84 | 1.89 | |
# indels per 100kbp | 1.13 | 0.83 | 3.26 | 0.61 | |
# N's per 100kbp | 533.22 | 1545.02 | 698.87 | 801.7 | |
Genome statistics | |||||
Genome fraction (%) | 99.345 | 98.343 | 99.265 | 99.284 | |
Duplication ratio | 1.012 | 1.016 | 1.021 | 1.028 | |
# genes | 4465 + 11 part | 4395 + 31 part | 4460 + 14 part | 4459 + 9 part | |
NGA50 | 3180483 | 687701 | 654008 | 1295677 |