R. sphaeroides

Revision as of 21 January 2014 21:15 by admin (Comments | Contribs) | (→Evaluation)

(diff) ← Previous revision | Current revision | Next revision → (diff)

Rhodobacter sphaeroides strain 2.4.1. The R. sphaeroides 2.4.1 consists of two circular chromosomes of 3,188,609 bp and 943,016 bp, and five plasmids of 114,045 bp, 114,178 bp, 105,284 bp, 100,828 bp and 37,100 bp in length, respectively. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.

Contents [hide]
1 Published data 2 Raw data 3 Fractional data 4 50X fragment reads 5 100X fragment reads 6 Evaluation

Published data

The Illumina and pacbio data were downloaded from ALLPATHS-LG website : rhody_data.tar.gz

Fragment library
Reads length : 101bp
Reads amount : 4354215 X2
Insert size : 180bp
Coverage : 170.16X
Jumping library
Reads length : 101bp
Reads amount : 1974031 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1031.19bp
Reads amount : 1994107
Coverage : 446.44X

Raw data

The raw data of website data from Sequence Read Archive (SRA)

Fragment library
Accession : SRX000946
Reads length : 101bp
Reads amount : 11339101 X2
Insert size : 180bp
Coverage : 433.12X
Jumping library
Accession : SRX111018
PacBio reads
Accession : SRX109847(SRR386702), SRX109812,SRX109830,SRX109818(SRR386746),SRX111329

Fractional data

We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
FRAG_FRAC=0.384\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out

50X fragment reads

We randomly selected 50X coverage data from fragment library and 50X coverage data from jumping library by prepare.sh.

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
GENOME_SIZE=4600000\
FRAG_COVERAGE=50\
JUMP_COVERAGE=50\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out

100X fragment reads

We randomly selected 100X coverage data from fragment library and 100X coverage data from jumping library by prepare.sh.

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
GENOME_SIZE=4600000\
FRAG_COVERAGE=100\
JUMP_COVETAGE=100\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out

Evaluation

Benchmark genome

R. sphaeroides 2.4.1

Evaluated by QUAST

QUAST (QUAST v2.2)

Running QUAST needs gene and sequence information. There are 4438 genes in total.

Score with QUAST: With PacBio Long Reads more detail

Basic statistics	Published Data	Raw Data	Fractional Data	50X Coverage	100X Coverage
# contigs	11	13	10	NA	12
Largest contig	3188818	3188540	3188847	NA	3188773
Total length	4601792	4588701	4609235	NA	4602493
N50	3188818	3188540	3188847	NA	3188773
Misassemblies
# misassemblies	16	12	20	NA	19
Misassembled contigs length	4370092	4361060	4557570	NA	4484870
Mismatches
# mismatches per 100kbp	3.48	3.77	4.8	NA	6.79
# indels per 100kbp	3.52	5.13	4.87	NA	6.22
# N's per 100kbp	0	0.09	0.13	NA	45.04
Genome statistics
Genome fraction (%)	99.932	99.683	99.948	NA	99.85
Duplication ratio	1.011	1.005	1.009	NA	1.004
# genes	4381 + 6 part	4369 + 10 part	4380+ 7 part	NA	4373 + 13 part
NGA50	904505	2938269	2715665	NA	1405713

Score with QUAST: Without PacBio Long Reads more detail

Basic statistics	Published Data	Raw Data	Fractional Data	50X Coverage	100X Coverage
# contigs	31	57	32	26
Largest contig	3188995	3186675	1674993	3190277
Total length	4592561	4583750	4620837	4607723
N50	3188995	3186675	1492665	3190277
Misassemblies
# misassemblies	9	6	17	27
Misassembled contigs length	4205887	4147900	2637662	4422750
Mismatches
# mismatches per 100kbp	5.81	4.23	7.49	10.76
# indels per 100kbp	5.64	3.57	4.72	8.94
# N's per 100kbp	120.74	149.31	197.84	812.74
Genome statistics
Genome fraction (%)	99.45	98.789	99.468	98.896
Duplication ratio	1.018	1.022	1.015	1.02
# genes	4348 + 27 part	4313 + 47 part	4343 + 31 part	4266 + 101 part
NGA50	3182258	3180491	1487141	546353