Line 1: |
|
|
+ |
We did assembly by SPAdes with [[Data|Dataset 4]] raw data first, and then used different subreads depths of [[Data|Dataset 5]] and [[Pacbio Data|Dataset 9]] to scaffold by SSPACE-longread.
|
|
|
|
|
|
|
+ |
We arbitrary chose 1-4 SMRT cells:<br>
|
|
|
+ |
One single SMRT cell: m120208_071634<br>
|
|
|
+ |
Two SMRT cells: m120228_210845 + m120208_122534<br>
|
|
|
+ |
Three SMRT cells: m120228_115504 + m120228_152936 + m120228_100807<br>
|
|
|
+ |
Four SMRT cells: m120228_171636 + m120228_223624 + m120228_100807 + m120228_190630 <br>
|
|
|
|
|
|
|
+ |
spades.py -1 reads_1.fastq -2 reads_2.fastq -o output
|
|
|
|
|
|
|
+ |
SSPACE-LongRead.pl -c contig.fasta -p filter_subreads.fasta -b output
|
|
|
|
|
|
|
+ |
= Evaluation =
|
|
|
+ |
We have evaluated the assemblies with [http://bioinf.spbau.ru/en/quast QUAST 2.3](reference genome [ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/ NC_000913] and [[Media: Ec_gene_result.ncbi | Ec_gene_list]]). [[more detail]]
|
|
|
|
|
|
|
+ |
{| {{table}} border="1"
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Statistics without reference'''
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Miseq_only'''
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Miseq_1cell'''
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Miseq_2cell'''
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Miseq_3cell'''
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Miseq_4cell'''
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Miseq_17cell'''
|
|
|
+ |
| align="center" style="background:#f0f0f0;"|'''Miseq_d9'''
|
|
|
+ |
|-
|
|
|
+ |
| # contigs||86||15||18||16||15||17||14
|
|
|
+ |
|-
|
|
|
+ |
| Largest contig||285889||2497845||1260980||2501081||3194637||1954649||3392211
|
|
|
+ |
|-
|
|
|
+ |
| Total length||4577132||4632009||4633058||4636174||4638657||4633857||4632677
|
|
|
+ |
|-
|
|
|
+ |
| N50||139882||2497845||1238868||2501081||3194637||1238635||3392211
|
|
|
+ |
|-
|
|
|
+ |
| style="background:#f0f0f0;"| Misassemblies||||||||||||||
|
|
|
+ |
|-
|
|
|
+ |
| # misassemblies||2||9||10||10||10||8||8
|
|
|
+ |
|-
|
|
|
+ |
| Misassembled contigs length||215581||3193893||3244631||2705788||3657574||3243566||4050696
|
|
|
+ |
|-
|
|
|
+ |
| style="background:#f0f0f0;"| Mismatches||||||||||||||
|
|
|
+ |
|-
|
|
|
+ |
| # mismatches per 100 kbp||3.02||7.15||6.2||6.9||7.17||6.05||6.43
|
|
|
+ |
|-
|
|
|
+ |
| # indels per 100 kbp||0.46||1.06||1||1.32||1.27||0.95||1.06
|
|
|
+ |
|-
|
|
|
+ |
| # N's per 100 kbp||0||97.89||67.67||77.33||77.37||91.03||123.88
|
|
|
+ |
|-
|
|
|
+ |
| style="background:#f0f0f0;"| Genome statistics||||||||||||||
|
|
|
+ |
|-
|
|
|
+ |
| Genome fraction (%)||98.451||99.498||99.483||99.664||99.748||99.432||99.587
|
|
|
+ |
|-
|
|
|
+ |
| Duplication ratio||1.001||1.002||1.002||1.002||1.002||1.003||1.001
|
|
|
+ |
|-
|
|
|
+ |
| # genes||4399 + 32 part||4467 + 14 part||4465 + 13 part||4476 + 11 part||4477 + 11 part||4467 + 11 part||4470 + 13 part
|
|
|
+ |
|-
|
|
|
+ |
| NGA50||133059||571664||425173||852639||1039467||1039472||1039654
|
|
|
+ |
|-
|
|
|
+ |
|}
|