Dataset 2, Rhodobacter sphaeroides strain 2.4.1. This dataset includes three libraries: fragment, jump and long reads. The R. sphaeroides 2.4.1 consists of two circular chromosomes of 3,188,609 bp and 943,016 bp, and five plasmids of 114,045 bp, 114,178 bp, 105,284 bp, 100,828 bp and 37,100 bp in length, respectively. Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.
Contents |
---|
The Illumina and pacbio data were downloaded from ALLPATHS-LG website : rhody_data.tar.gz
Fragment library
Reads length : 101bp
Reads amount : 4354215 X2
Insert size : 180bp
Coverage : 191.0X
Jumping library
Reads length : 101bp
Reads amount : 1974031 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1031.19bp
Reads amount : 1994107
Coverage : 446.4X
The raw Illumina data were obtained from Sequence Read Archive (SRA).
Fragment library
Accession : SRR125492
Reads length : 101bp
Reads amount : 11339101 X2
Insert size : 180bp
Coverage : 497.3X
Jumping library
Accession : SRR388672
We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ FRAG_FRAC=0.384\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 50X coverage data from fragment library and 50X coverage data from jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=4610000\ FRAG_COVERAGE=50\ JUMP_COVERAGE=50\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
We randomly selected 100X coverage data from fragment library and 100X coverage data from jumping library by prepare.sh.
PrepareAllPathsInputs.pl\ DATA_DIR=$PWD/test.genome/data\ PLOIDY=1\ GENOME_SIZE=4600000\ FRAG_COVERAGE=100\ JUMP_COVETAGE=100\ IN_GROUPS_CSV=in_groups.csv\ IN_LIBS_CSV=in_libs.csv\ OVERWRITE=True\ | tee prepare.out
Basic statistics | Website Data | Raw Data | Fractional Data | 50X Coverage | 100X Coverage |
# contigs | 11 | 13 | 10 | NA | 12 |
Largest contig | 3188818 | 3188540 | 3188847 | NA | 3188773 |
Total length | 4601792 | 4588701 | 4609235 | NA | 4602493 |
N50 | 3188818 | 3188540 | 3188847 | NA | 3188773 |
Misassemblies | |||||
# misassemblies | 3 | 3 | 4 | NA | 4 |
Misassembled contigs length | 133056 | 1067239 | 206335 | NA | 196336 |
Mismatches | |||||
# mismatches per 100kbp | 3.04 | 3.69 | 4.15 | NA | 6.35 |
# indels per 100kbp | 2.91 | 3.93 | 3.65 | NA | 5.48 |
# N's per 100kbp | 0 | 0.09 | 0.13 | NA | 45.04 |
Genome statistics | |||||
Genome fraction (%) | 99.932 | 99.583 | 99.927 | NA | 99.834 |
Duplication ratio | 1.001 | 1.001 | 1.002 | NA | 1.002 |
# genes | 4381 + 6 part | 4365 + 11 part | 4379+ 7 part | NA | 4372 + 14 part |
NGA50 | 3188814 | 3188540 | 3188795 | NA | 3188333 |
Running Time | 2hr12m | 5hr 09m | 1hr 49m | NA | 3hr 33m |
Misassemblies for Adobe reader.
Basic statistics | Website Data | Raw Data | Fractional Data | 50X Coverage | 100X Coverage |
# contigs | 31 | 57 | 32 | 79 | 29 |
Largest contig | 3188995 | 3186675 | 1674993 | 263863 | 2634704 |
Total length | 4592561 | 4583750 | 4620837 | 4096056 | 4628027 |
N50 | 3188995 | 3186675 | 1492665 | 99916 | 2634704 |
Misassemblies | |||||
# misassemblies | 6 | 4 | 10 | 249 | 16 |
Misassembled contigs length | 4163443 | 4147900 | 2637662 | 3829308 | 3815869 |
Mismatches | |||||
# mismatches per 100kbp | 5.81 | 4.23 | 7.41 | 21.37 | 7.02 |
# indels per 100kbp | 3.65 | 3.57 | 4.5 | 4.88 | 4.99 |
# N's per 100kbp | 120.74 | 149.31 | 197.84 | 227311 | 1572.6 |
Genome statistics | |||||
Genome fraction (%) | 99.417 | 98.669 | 99.437 | 64.402 | 98.348 |
Duplication ratio | 1.004 | 1.009 | 1.01 | 1.38 | 1.022 |
# genes | 4345 + 30 part | 4308 + 47 part | 4341 + 31 part | 2176+ 1205part | 4183 + 185 part |
NGA50 | 3182258 | 3180491 | 1486855 | 9975 | 511933 |
Running Time | 41m | 1hr 01m | 36m | 19m | 33m |
Misassemblies for Adobe reader.