Escherichia coli K12 MG1655. The E. coli MG1655 consists of a circular chromosome of 4,639,675 bp in length. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : ecoli_data_alt.tar.gz
Fragment library
Reads length : 101bp
Reads amount : 1186190 X2
Insert size : 180bp
Coverage : 46.02X
Jumping library 1
Reads length : 93bp
Reads amount : 1615702 X2
Insert size : 3000bp
Jumping library 2
Reads length : 93bp
Reads amount : 362199 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1514.24bp
Reads amount : 409304
Coverage : 133.58X
The raw data of website data from Sequence Read Archive (SRA)
Fragment library
Accession : SRX131033
Reads length : 101bp
Reads amount : 13457571 X2
Insert size : 180bp
Coverage : 522.1X
Jumping library 1
Accession : SRX117481
Jumping library 2
Accession : SRR492488
PacBio reads
Accession : SRX109917, SRX109901(SRR386913, SRR387092, SRR386907, SRR387035), SRX109936
We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.088\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library of raw data by prepare.sh.
Fraction = 100 / 522.1 = 0.192
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.192\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Basic statistics | Raw Data | Website Data | Self-fraction Data | 100 Coverage | ||||
# contigs | 14 | 1 | 1 | 1 | ||||
Largest contig | 71578 | 177098 | 241348 | 44874 | 204500 | 739647 | 236829 | 740054 |
Total length | 4503182 | 3953489 | 4247061 | 4091078 | 4549335 | 4781613 | 4526809 | 4950301 |
NG50 | 21441 | 40287 | 144812 | 7971 | 45133 | 518052 | 85272 | 523557 |
Misassemblies | ||||||||
# misassemblies | 2 | 6 | 5 | 0 | 3 | 9 | 17 | 24 |
Misassembled contigs length | 24651 | 135726 | 40523 | 0 | 24631 | 1180527 | 564277 | 1850338 |
Genome statistics | ||||||||
Genome fraction (%) | 97.529 | 85.608 | 91.915 | 87.77 | 98.068 | 99.37 | 97.437 | 99.474 |
# genes | 4030 + 275 part | 3685 + 128 part | 4036 + 38 part | 3348 + 592 part | 4046 + 307 part | 4347 + 23 part | 4068 + 265 part | 4358 + 19 part |
# mismatches per 100 kbp | 4.83 | 10.51 | 12.48 | 3.22 | 5.98 | 5.82 | 13.11 | 6.42 |
# indels per 100 kbp | 3.43 | 7.97 | 5.84 | 3.34 | 3.9 | 3.63 | 8.34 | 3.69 |
# N's per 100kbp | 0 | 20.77 | 4.83 | 0 | 139.23 | 0 | 739.09 | 3.68 |