Streptococcus pneumoniae TIGR4. The S. pneumoniae TIGR4 consists of a circular chromosome of 2,160,842 bp in length. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : strep_data.tar.gz
Fragment library
Reads length : 101bp
Reads amount : 1067060 X2
Insert size : 180bp
Coverage : 88.89X
Jumping library
Reads length : 93bp
Reads amount : 1161883 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1159.12bp
Reads amount : 403745
Coverage : 216.58X
The raw data of website data from Sequence Read Archive (SRA)
Fragment library
Accession : SRX110128
Reads length : 101bp
Reads amount : 5706200 X2
Insert size : 180bp
Coverage : 475.33X
Jumping library
Accession : SRX105406
PacBio reads
Accession : SRX109959,SRX109958
We randomly selected the same fraction as website data from fragment and jumping library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.187\ JUMP_FRAC=0.558\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 50X coverage data from fragment library and 50X coverage data from jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=2165000\ FRAG_COVERAGE=50\ JUMP_COVERAGE=50\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We also use another setting with all jumping library reads.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ JUMP_FRAC=1\ GENOME_SIZE=2165000\ FRAG_COVERAGE=50\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library and 100X coverage data from jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=2165000\ FRAG_COVERAGE=100\ JUMP_COVERAGE=100\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We also use another setting with all jumping library reads.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ IN_GROUPS_CSV=in_groups.csv\ GENOME_SIZE=2165000\ FRAG_COVERAGE=100\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Basic statistics | Published Data | Raw Data | Fractional Data | 50X Coverage Data | 50X fragment with all jumping | 100X Coverage Data | 10X fragment with all jumping
|
# contigs | 1 | 5 | 1 | 2 | 1 | 1 | 1 |
Largest contig | 2162245 | 1340620 | 2151421 | 1189234 | 2149064 | 2150940 | 2153124 |
Total length | 2162245 | 2140045 | 2151421 | 2146017 | 2149064 | 2150940 | 2153124 |
N50 | 2162245 | 1340620 | 2151421 | 1189234 | 2149064 | 2150940 | 2153124 |
Misassemblies | |||||||
# misassemblies | 4 | 1 | 3 | 0 | 1 | 3 | 1 |
Misassembled contigs length | 2162245 | 1340620 | 2151421 | 0 | 2149064 | 2150940 | 2153124 |
Mismatches | |||||||
# mismatches per 100kbp | 5.7 | 2.15 | 2.05 | 2.05 | 2.1 | 2.05 | 2.05 |
# indels per 100kbp | 3.52 | 1.08 | 0.93 | 3.08 | 0.93 | 0.98 | 1.07 |
# N's per 100kbp | 0.05 | 0.14 | 0.09 | 0.14 | 171.89 | 0.09 | 172.96 |
Genome statistics | |||||||
Genome fraction (%) | 99.946 | 99.967 | 99.45 | 99.239 | 99.239 | 99.43 | 99.423 |
Duplication ratio | 1.011 | 1.005 | 1.016 | 1.015 | 1.017 | 1.016 | 1.012 |
# genes | 2297 + 4 part | 2299 + 2 part | 2297 + 4 part | 2294+ 4 part | 2294 + 4 part | 2297 + 4 part | 2298 + 3 part |
NGA50 | 1198037 | 1338442 | 1189098 | 1188680 | 1188680 | 1189098 | 1590348 |
Basic statistics | Raw Data | Website Data | Self-fraction Data | 100 Coverage |
# contigs | 6 | 4 | 4 | 4 |
Largest contig | 2135901 | 1663585 | 1671738 | 1664345 |
Total length | 2144412 | 2161502 | 2160013 | 2156958 |
N50 | 2135901 | 1663585 | 1671738 | 1664345 |
Misassemblies | ||||
# misassemblies | 19 | 5 | 11 | 9 |
Misassembled contigs length | 2138844 | 1663585 | 1949937 | 2154315 |
Mismatches | ||||
# mismatches per 100kbp | 4.9 | 2.62 | 2.59 | 3.33 |
# indels per 100kbp | 17.33 | 1.5 | 11.78 | 14.39 |
# N's per 100kbp | 1714.74 | 759.8 | 1505.69 | 994.27 |
Genome statistics | ||||
Genome fraction (%) | 97.204 | 98.798 | 98.624 | 98.72 |
Duplication ratio | 1.019 | 1.022 | 1.047 | 1.021 |
# genes | 2239 + 33 part | 2271 + 15 part | 2275 + 15 part | 2266 + 17 part |
NGA50 | 224016 | 403409 | 397648 | 1533035 |