R. sphaeroides

Revision as of 19 October 2013 01:56 by admin (Comments | Contribs) | (Evaluation)

Rhodobacter sphaeroides strain 2.4.1. The R. sphaeroides 2.4.1 consists of two circular chromosomes of 3,188,609 bp and 943,016 bp, and five plasmids of 114,045 bp, 114,178 bp, 105,284 bp, 100,828 bp and 37,100 bp in length, respectively. The Illumina sequencing data were available at ALLPATHS-LG website, Please refer to Finished bacterial genomes from shotgun sequence data. Genome Research 2012 for detail.

Contents

Website data

The Illumina and pacbio data were downloaded from ALLPATHS-LG website : rhody_data.tar.gz

Fragment library
Reads length : 101bp
Reads amount : 4354215 X2
Insert size : 180bp
Coverage : 170.16X
Jumping library
Reads length : 101bp
Reads amount : 1974031 X2
Insert size : 3000bp
PacBio reads
Reads average length : 1031.19bp
Reads amount : 1994107
Coverage : 446.44X

Raw data

The raw data of website data from Sequence Read Archive (SRA)

Fragment library
Accession : SRX000946
Reads length : 101bp
Reads amount : 11339101 X2
Insert size : 180bp
Coverage : 433.12X
Jumping library
Accession : SRX111018
PacBio reads
Accession : SRX109847(SRR386702), SRX109812,SRX109830,SRX109818(SRR386746),SRX111329

Self-fraction data

We randomly selected the same fraction as website data from fragment library of raw data by prepare.sh.

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
FRAG_FRAC=0.384\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out 

100X fragment reads

We randomly selected 100X coverage data from fragment library of raw data by prepare.sh.

Fraction = 100/443.12 = 0.226

PrepareAllPathsInputs.pl\
DATA_DIR=$PWD/test.genome/data\
PLOIDY=1\
FRAG_FRAC=0.226\
IN_GROUPS_CSV=in_groups.csv\
IN_LIBS_CSV=in_libs.csv\
OVERWRITE=True\
| tee prepare.out 

Evaluation

  • Benchmark genome
R. sphaeroides 2.4.1
  • Evaluated by QUAST
QUAST (QUAST v2.2)
Running QUAST needs gene and sequence information. There are 4438 genes in total.
  • Score with QUAST: With PacBio Long Reads more detail
Basic statistics Raw Data Website Data Self-fraction Data 100 Coverage
# contigs 13 11 10 11
Largest contig 3188540 3188818 3188847 3188802
Total length 4588701 4601792 4609235 4601762
N50 3188540 3188818 3188847 3188802
Misassemblies
# misassemblies 12 16 20 19
Misassembled contigs length 4361060 4370092 4557570 4484253
Mismatches
# mismatches per 100kbp 3.77 3.48 4.8 6.43
# indels per 100kbp 5.13 3.52 4.87 5.61
# N's per 100kbp 0.09 0 0.13 0.07
Genome statistics
Genome fraction (%) 99.683 99.932 99.948 99.945
Duplication ratio 1.005 1.011 1.009 1.007
# genes 4369 + 10 part  4381 + 6 part  4380+ 7 part  4378 + 8 part
NGA50 2938269 904505 2715665 3170709
  • Score with QUAST: Without PacBio Long Reads more detail
Basic statistics Raw Data Website Data Self-fraction Data 100 Coverage
# contigs 1 2 5 3
Largest contig 4633080 4631220 4575759 4560636
Total length 4633080 4633146 4698903 4713335
N50 4633080 4631220 4575759 4590636
Misassemblies
# misassemblies 7 3 8 8
Misassembled contigs length 4633080 4631220 4577746 4711603
Mismatches
# mismatches per 100kbp 1.42 1.19 2.84 1.89
# indels per 100kbp 0.83 1.13 3.26 0.61
# N's per 100kbp 1545.02 533.22 698.87 801.7
Genome statistics
Genome fraction (%) 98.343 99.345 99.265 99.284
Duplication ratio 1.016 1.012 1.021 1.028
# genes 4395 + 31 part  4465 + 11 part  4460 + 14 part  4459 + 9 part
NGA50 687701 3180483 654008 1295677