Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : ecoli_data_alt.tar.gz
Fragment library
Reads length : 101bp
Reads amount : 1186190 X2
Insert size : 180bp
Coverage : 46.02X
Jumping library 1
Reads length : 93bp
Reads amount : 1615702 X2
Insert size : 3000bp
Jumping library 2
Reads length : 93bp
Reads amount : 362199 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1514.24bp
Reads amount : 409304
Coverage : 133.58X
The raw data of website data from Sequence Read Archive (SRA)
Fragment library
Accession : SRX131033
Reads length : 101bp
Reads amount : 13457571 X2
Insert size : 180bp
Coverage : 522.1X
Jumping library 1
Accession : SRX117481
Jumping library 2
Accession : SRR492488
PacBio reads
Accession : SRX109917, SRX109901(SRR386913, SRR387092, SRR386907, SRR387035), SRX109936
We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.088\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 50X coverage data from fragment library and 50X coverage data from jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=4640000\ FRAG_COVERAGE=50\ JUMP_COVERAGE=50\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library and 100X coverage data from jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=4640000\ FRAG_COVERAGE=100\ JUMP_COVERAGE=100\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Basic statistics | Published Data | Raw Data | Fractional Data | 50X coverage | 100X coverage |
# contigs | 1 | 14 | 1 | 1 | 1 |
Largest contig | 4638970 | 4625005 | 4638970 | 4638970 | 4638970 |
Total length | 4638970 | 4652215 | 4638970 | 4638970 | 4638970 |
N50 | 4638970 | 4625005 | 4638970 | 4638970 | 4638970 |
Misassemblies | |||||
# misassemblies | 1 | 5 | 1 | 1 | 1 |
Misassembled contigs length | 4638970 | 4625005 | 4638970 | 4638970 | 4638970 |
Mismatches | |||||
# mismatches per 100kbp | 0.11 | 1.06 | 0.06 | 0.09 | 0.06 |
# indels per 100kbp | 0.09 | 0.61 | 0.09 | 0.09 | 0.09 |
# N's per 100kbp | 0 | 282.94 | 0 | 0.04 | 0 |
Genome statistics | |||||
Genome fraction (%) | 99.983 | 99.418 | 99.983 | 99.983 | 99.983 |
Duplication ratio | 1 | 1.013 | 1 | 1 | 1 |
# genes | 4494 + 1 part | 4471 + 2 part | 4494 + 1 part | 4494 + 1 part | 4494 + 1 part |
NGA50 | 4638970 | 2714032 | 3763133 | 4209920 | 3762305 |
Basic statistics | Published Data | Raw Data | Fractional Data | 50X coverage | 100X coverage |
# contigs | 2 | 1 | 5 | 3 | 2 |
Largest contig | 4631220 | 4633080 | 4575759 | 4629108 | 4638312 |
Total length | 4633146 | 4633080 | 4698903 | 4633082 | 4640072 |
N50 | 4631220 | 4633080 | 4575759 | 4629108 | 4638312 |
Misassemblies | |||||
# misassemblies | 3 | 7 | 8 | 8 | 5 |
Misassembled contigs length | 4631220 | 4633080 | 4577746 | 4631095 | 4638312 |
Mismatches | |||||
# mismatches per 100kbp | 1.19 | 1.42 | 2.84 | 2.52 | 2.26 |
# indels per 100kbp | 1.13 | 0.83 | 3.26 | 1.24 | 1.85 |
# N's per 100kbp | 533.22 | 1545.02 | 698.87 | 703.96 | 760.38 |
Genome statistics | |||||
Genome fraction (%) | 99.345 | 98.343 | 99.265 | 99.136 | 99.272 |
Duplication ratio | 1.012 | 1.016 | 1.021 | 1.008 | 1.008 |
# genes | 4465 + 11 part | 4395 + 31 part | 4460 + 14 part | 4451 + 13 part | 4455 + 14 part |
NGA50 | 3180483 | 687701 | 654008 | 2675325 | 694154 |