Streptococcus pneumoniae TIGR4. The S. pneumoniae TIGR4 consists of a circular chromosome of 2,160,842 bp in length. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : strep_data.tar.gz
Fragment library
Reads length : 101bp
Reads amount : 1067060 X2
Insert size : 180bp
Coverage : 88.89X
Jumping library
Reads length : 93bp
Reads amount : 1161883 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1159.12bp
Reads amount : 403745
Coverage : 216.58X
The raw data of website data from Sequence Read Archive (SRA)
Fragment library
Accession : SRX110128
Reads length : 101bp
Reads amount : 5706200 X2
Insert size : 180bp
Coverage : 475.33X
Jumping library
Accession : SRX105406
PacBio reads
Accession : SRX109959,SRX109958
We randomly selected the same fraction as website data from fragment and jumping library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.187\ JUMP_FRAC=0.558\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library of raw data by prepare.sh.
Fragment library fraction = 100/475.12 = 0.21
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.21\ JUMP_FRAC=0.558\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We also use another setting with all jumping library reads.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.21\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Basic statistics | Raw Data | Website Data | Self-fraction Data | 100 Coverage |
# contigs | 13 | 11 | 10 | 11 |
Largest contig | 3188540 | 3188818 | 3188847 | 3188802 |
Total length | 4588701 | 4601792 | 4609235 | 4601762 |
N50 | 3188540 | 3188818 | 3188847 | 3188802 |
Misassemblies | ||||
# misassemblies | 12 | 16 | 20 | 19 |
Misassembled contigs length | 4361060 | 4370092 | 4557570 | 4484253 |
Mismatches | ||||
# mismatches per 100kbp | 3.77 | 3.48 | 4.8 | 6.43 |
# indels per 100kbp | 5.13 | 3.52 | 4.87 | 5.61 |
# N's per 100kbp | 0.09 | 0 | 0.13 | 0.07 |
Genome statistics | ||||
Genome fraction (%) | 99.683 | 99.932 | 99.948 | 99.945 |
Duplication ratio | 1.005 | 1.011 | 1.009 | 1.007 |
# genes | 4369 + 10 part | 4381 + 6 part | 4380+ 7 part | 4378 + 8 part |
NGA50 | 2938269 | 904505 | 2715665 | 3170709 |
Basic statistics | Raw Data | Website Data | Self-fraction Data | 100 Coverage |
# contigs | 57 | 31 | 32 | 26 |
Largest contig | 3186675 | 3188995 | 1674993 | 3190277 |
Total length | 4583750 | 4592561 | 4620837 | 4607723 |
N50 | 3186675 | 3188995 | 1492665 | 3190277 |
Misassemblies | ||||
# misassemblies | 6 | 9 | 17 | 27 |
Misassembled contigs length | 4147900 | 4205887 | 2637662 | 4422750 |
Mismatches | ||||
# mismatches per 100kbp | 4.23 | 5.81 | 7.49 | 10.76 |
# indels per 100kbp | 3.57 | 5.64 | 4.72 | 8.94 |
# N's per 100kbp | 149.31 | 120.74 | 197.84 | 812.74 |
Genome statistics | ||||
Genome fraction (%) | 98.789 | 99.45 | 99.468 | 98.896 |
Duplication ratio | 1.022 | 1.018 | 1.015 | 1.02 |
# genes | 4313 + 47 part | 4348 + 27 part | 4343 + 31 part | 4266 + 101 part |
NGA50 | 3180491 | 3182258 | 1487141 | 546353 |