R. sphaeroides

Dataset 2, Rhodobacter sphaeroides strain 2.4.1. This dataset includes three libraries: fragment, jump and long reads. The R. sphaeroides 2.4.1 consists of two circular chromosomes of 3,188,609 bp and 943,016 bp, and five plasmids of 114,045 bp, 114,178 bp, 105,284 bp, 100,828 bp and 37,100 bp in length, respectively. Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.

Contents [hide]
1 Website data 2 Raw data 3 Fractional data 4 50X coverage data 5 100X coverage data 6 Evaluation

Website data

The Illumina and pacbio data were downloaded from ALLPATHS-LG website : rhody_data.tar.gz

Fragment library
Reads length : 101bp
Reads amount : 4354215 X2
Insert size : 180bp
Coverage : 191.0X

Jumping library
Reads length : 101bp
Reads amount : 1974031 X2
Insert size : 3000bp

PacBio reads
Reads average length : 1031.19bp
Reads amount : 1994107
Coverage : 446.4X

Raw data

The raw Illumina data were obtained from Sequence Read Archive (SRA).

Fragment library
Accession : SRR125492
Reads length : 101bp
Reads amount : 11339101 X2
Insert size : 180bp
Coverage : 497.3X

Jumping library
Accession : SRR388672

Fractional data

We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
FRAG_FRAC=0.384\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out

50X coverage data

We randomly selected 50X coverage data from fragment library and 50X coverage data from jumping library by prepare.sh.

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
GENOME_SIZE=4610000\
FRAG_COVERAGE=50\
JUMP_COVERAGE=50\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out

100X coverage data

We randomly selected 100X coverage data from fragment library and 100X coverage data from jumping library by prepare.sh.

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
GENOME_SIZE=4600000\
FRAG_COVERAGE=100\
JUMP_COVETAGE=100\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out

Evaluation

Benchmark genome

R. sphaeroides 2.4.1

Evaluated by QUAST

QUAST (QUAST v2.3)

Running QUAST requires gene list and reference genome. There are 4388 genes in total.

Score with QUAST: With PacBio Long Reads more detail

Basic statistics	Website Data	Raw Data	Fractional Data	50X Coverage	100X Coverage
# contigs	11	13	10	NA	12
Largest contig	3188818	3188540	3188847	NA	3188773
Total length	4601792	4588701	4609235	NA	4602493
N50	3188818	3188540	3188847	NA	3188773
Misassemblies
# misassemblies	3	3	4	NA	4
Misassembled contigs length	133056	1067239	206335	NA	196336
Mismatches
# mismatches per 100kbp	3.04	3.69	4.15	NA	6.35
# indels per 100kbp	2.91	3.93	3.65	NA	5.48
# N's per 100kbp	0	0.09	0.13	NA	45.04
Genome statistics
Genome fraction (%)	99.932	99.583	99.927	NA	99.834
Duplication ratio	1.001	1.001	1.002	NA	1.002
# genes	4381 + 6 part	4365 + 11 part	4379+ 7 part	NA	4372 + 14 part
NGA50	3188814	3188540	3188795	NA	3188333
Running Time	2hr12m	5hr 09m	1hr 49m	NA	3hr 33m

Misassemblies for Adobe reader.

Score with QUAST: Without PacBio Long Reads more detail

Basic statistics	Website Data	Raw Data	Fractional Data	50X Coverage	100X Coverage
# contigs	31	57	32	79	29
Largest contig	3188995	3186675	1674993	263863	2634704
Total length	4592561	4583750	4620837	4096056	4628027
N50	3188995	3186675	1492665	99916	2634704
Misassemblies
# misassemblies	6	4	10	249	16
Misassembled contigs length	4163443	4147900	2637662	3829308	3815869
Mismatches
# mismatches per 100kbp	5.81	4.23	7.41	21.37	7.02
# indels per 100kbp	3.65	3.57	4.5	4.88	4.99
# N's per 100kbp	120.74	149.31	197.84	227311	1572.6
Genome statistics
Genome fraction (%)	99.417	98.669	99.437	64.402	98.348
Duplication ratio	1.004	1.009	1.01	1.38	1.022
# genes	4345 + 30 part	4308 + 47 part	4341 + 31 part	2176+ 1205part	4183 + 185 part
NGA50	3182258	3180491	1486855	9975	511933
Running Time	41m	1hr 01m	36m	19m	33m

Misassemblies for Adobe reader.