The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : ecoli_data_alt.tar.gz
Reads length : 101bp
Reads amount : 1186190 X2
Insert size : 180bp
Coverage : 46.02X
Reads length : 93bp
Reads amount : 1615702 X2
Insert size : 3000bp
Reads length : 93bp
Reads amount : 362199 X2
Insert size : 3000bp
Reads average length : 1514.24bp
Reads amount : 409304
Coverage : 133.58X
The raw data of website data from Sequence Read Archive (SRA)
Accession : SRX131033
Reads length : 101bp
Reads amount : 13457571 X2
Insert size : 180bp
Coverage : 522.1X
Accession : SRX117481
Accession : SRR492488
Accession : SRX109917, SRX109901(SRR386913, SRR387092, SRR386907, SRR387035), SRX109936
We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.088\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library of raw data by prepare.sh.
Fraction = 100 / 522.1 = 0.192
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.192\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Rhodobacter sphaeroides strain 2.4.1. The R. sphaeroides 2.4.1 consists of two circular chromosomes of 3,188,609 bp and 943,016 bp, and five plasmids of 114,045 bp, 114,178 bp, 105,284 bp, 100,828 bp and 37,100 bp in length, respectively.
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : rhody_data.tar.gz
Reads length : 101bp
Reads amount : 4354215 X2
Insert size : 180bp
Coverage : 170.16X
Reads length : 101bp
Reads amount : 1974031 X2
Insert size : 3000bp
Reads average length : 1031.19bp
Reads amount : 1994107
Coverage : 446.44X
The raw data of website data from Sequence Read Archive (SRA)
Accession : SRX000946
Reads length : 101bp
Reads amount : 11339101 X2
Insert size : 180bp
Coverage : 433.12X
Accession : SRX111018
Accession : SRX109847(SRR386702), SRX109812,SRX109830,SRX109818(SRR386746),SRX111329
We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.384\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library of raw data by prepare.sh.
Fraction = 100/443.12 = 0.226
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.226\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Streptococcus pneumoniae TIGR4. The S. pneumoniae TIGR4 consists of a circular chromosome of 2,160,842 bp in length.
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : strep_data.tar.gz
Reads length : 101bp
Reads amount : 1067060 X2
Insert size : 180bp
Coverage : 88.89X
Reads length : 93bp
Reads amount : 1161883 X2
Insert size : 3000bp
Reads average length : 1159.12bp
Reads amount : 403745
Coverage : 216.58X
The raw data of website data from Sequence Read Archive (SRA)
Accession : SRX110128
Reads length : 101bp
Reads amount : 5706200 X2
Insert size : 180bp
Coverage : 475.33X
Accession : SRX105406
Accession : SRX109959,SRX109958
We randomly selected the same fraction as website data from fragment and jumping library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.187\ JUMP_FRAC=0.558\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library of raw data by prepare.sh.
Fragment library fraction = 100/475.12 = 0.21
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.21\ JUMP_FRAC=0.558\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We also use another setting with all jumping library reads.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.21\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out