PBcR

Revision as of 12 December 2014 02:50 by admin (Comments | Contribs) | (0.8)



Contents

Performance

PBcR -pbCNS -length 500 -partitions 200 -l ecoli -s pacbio.spec -fastq filtered_subreads.fastq
PBcR -pbCNS -length 500 -partitions 200 -l ecoli -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=4650000
PBcR -pbCNS -length 500 -partitions 200 -l ecoli -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=4185000
PBcR -pbCNS -length 500 -partitions 200 -l ecoli -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=5115000
PBcR -pbCNS -length 500 -partitions 200 -l ecoli -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=3720000
PBcR -pbCNS -length 500 -partitions 200 -l ecoli -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=5580000

0

without genomeSize

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set 8 SMRT cells : 1st Set 8 SMRT cells : 2nd Set 8 SMRT cells : 3rd Set
# contigs 1 4 1 6 1 1 1 1 1 1
Largest contig 4 648 864 3 621 580 4 639 780 2 212 593 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
Total length 4 648 864 4 602 589 4 639 780 4 661 453 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
N50 4 648 864 3 621 580 4 639 780 887 256 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
Misassemblies
# misassemblies 10 8 8 8 8 8 10 9 8 8
Misassembled contigs length 4 648 864 3 621 580 4 639 780 3 857 726 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
Mismatches
# mismatches per 100kbp 0.19 0.54 0.37 0.34 0.17 0.09 0.15 0.34 0.09 0.11
# indels per 100kbp 1.59 24.54 22.42 15.82 6.38 10.22 5.63 4.38 5.32 6.06
# N's per 100kbp 0 0 0 0 0 0 0 0 0 0
Genome Statistics
Genome fraction(%) 100 99.169 99.965 99.981 100 100 100 100 100 100
Duplication ratio 1.002 1 1 1.003 1.003 1.001 1.002 1.002 1.002 1.002
# genes 4494+3 part 4458+8 part 4491+5 part 4491+6 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4493+3 part
NGA50 3 026 386 907 076 1 097 763 880 448 3 026 238 2 603 928 3 026 270 1 257 068 2 856 673 2 856 687
Running Time 12hr 14m 26m 16s 35m 50s 33m 3s 53m 36s 48m 9s 2hr 21m 2hr 43s 1hr 2m 53m 3s

1

genomeSize = 4650000

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set 8 SMRT cells : 1st Set 8 SMRT cells : 2nd Set 8 SMRT cells : 3rd Set
# contigs 1 4 1 6 1 1 1 1 1 1
Largest contig 4 648 864 3 621 580 4 639 780 2 212 593 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
Total length 4 648 864 4 602 589 4 639 780 4 661 453 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
N50 4 648 864 3 621 580 4 639 780 887 256 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
Misassemblies
# misassemblies 10 8 8 8 8 8 10 9 8 8
Misassembled contigs length 4 648 864 3 621 580 4 639 780 3 857 726 4 651 575 4 645 728 4 647 827 4 649 239 4 648 687 4 648 099
Mismatches
# mismatches per 100kbp 0.19 0.54 0.37 0.34 0.17 0.09 0.15 0.34 0.09 0.11
# indels per 100kbp 1.59 24.54 22.42 15.82 6.38 10.22 5.63 4.38 5.32 6.06
# N's per 100kbp 0 0 0 0 0 0 0 0 0 0
Genome Statistics
Genome fraction(%) 100 99.169 99.965 99.981 100 100 100 100 100 100
Duplication ratio 1.002 1 1 1.003 1.003 1.001 1.002 1.002 1.002 1.002
# genes 4494+3 part 4458+8 part 4491+5 part 4491+6 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4493+3 part
NGA50 3 026 386 907 076 1 097 763 880 448 3 026 238 2 603 928 3 026 270 1 257 068 2 856 673 2 856 687
Running Time 12hr 14m 26m 16s 35m 50s 33m 3s 53m 36s 48m 9s 2hr 21m 2hr 43s 1hr 2m 53m 3s

0.9

genomeSize=4185000

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set 8 SMRT cells : 1st Set 8 SMRT cells : 2nd Set 8 SMRT cells : 3rd Set
# contigs 2 6 1 6 1 1 1 1 1 1
Largest contig 4 644 985 3 621 592 4 639 809 2 212 597 4 651 574 4 645 691 4 647 833 4 649 261 4 648 735 4 648 123
Total length 4 656 267 4 640 061 4 639 809 4 650 332 4 651 574 4 645 691 4 647 833 4 649 261 4 648 735 4 648 123
N50 4 644 985 3 621 592 4 639 809 887 254 4 651 574 4 645 691 4 647 833 4 649 261 4 648 735 4 648 123
Misassemblies
# misassemblies 12 8 8 8 8 8 10 9 8 8
Misassembled contigs length 4 656 267 3 621 592 4 639 809 3 857 728 4 651 574 4 645 691 4 647 833 4 649 261 4 648 735 4 648 123
Mismatches
# mismatches per 100kbp 0.24 0.26 0.32 0.39 0.19 0.09 0.22 0.3 0.06 0.06
# indels per 100kbp 2.11 27.51 22.29 15.74 6.36 10.91 5.35 3.51 4.94 5.6
# N's per 100kbp 0 0 0 0 0 0 0 0 0 0
Genome Statistics
Genome fraction(%) 100 99.9 99.965 99.981 100 100 100 100 100 100
Duplication ratio 1.004 1.001 1 1.003 1.003 1.001 1.002 1.002 1.002 1.002
# genes 4494+3 part 4486+8 part 4491+5 part 4491+6 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part
NGA50 3 026 366 904 084 1 097 787 880 446 3 026 242 2 603 887 3 026 279 1 257 060 2 856 681 2 856 711
Running Time 10hr 1m 24m 51s 30m 28s 27m 44s 43m 13s 34m 39s 41m 29s 49m 7s 52m 10s 50m 29s

1.1

genomeSize=5115000

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set 8 SMRT cells : 1st Set 8 SMRT cells : 2nd Set 8 SMRT cells : 3rd Set
# contigs 1 5 1 6 1 2 1 1 1 1
Largest contig 4 649 007 3 621 673 4 639 792 2 212 738 4 648 429 4 645 722 4 647 825 4 649 266 4 648 723 4 648 107
Total length 4 649 007 4 636 546 4 639 792 4 650 567 4 648 429 4 645 722 4 647 825 4 649 266 4 648 723 4 648 107
N50 4 649 007 3 621 673 4 639 792 887 256 4 648 429 4 645 722 4 647 825 4 649 266 4 648 723 4 648 107
Misassemblies
# misassemblies 10 8 8 8 8 8 10 9 8 8
Misassembled contigs length 4 649 007 3 621 673 4 639 792 3 857 870 4 648 429 4 645 722 4 647 825 4 649 266 4 648 723 4 648 107
Mismatches
# mismatches per 100kbp 0.17 0.32 0.41 0.37 0.19 0.11 0.15 0.39 0.06 0.11
# indels per 100kbp 1.53 25.72 22.34 15.84 6.21 10.32 5.63 3.53 5.17 6.06
# N's per 100kbp 0 0 0 0 0 0 0 0 0 0
Genome Statistics
Genome fraction(%) 100 99.9 99.965 99.981 100 100 100 100 100 100
Duplication ratio 1.002 1 1 1.003 1.002 1.001 1.002 1.002 1.002 1.002
# genes 4494 +3 part 4486 +8 part 4491 +5 part 4491 +6 part 4494 +3 part 4494 +3 part 4494 +3 part 4494 +3 part 4494 +3 part 4494 +3 part
NGA50 3 026 390 907 197 1 097 774 880 448 3 026 239 2 603 918 3 026 271 1 257 073 2 856 674 2 856 693
Running Time
Running Time 2hr 24m 26m 41s 33m 14s 30m 4s 47m 8s 37m 26s 46m 5s 54m 46s 59m 16s 56m 19s

0.8

genomeSize=3720000

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set 8 SMRT cells : 1st Set 8 SMRT cells : 2nd Set 8 SMRT cells : 3rd Set
# contigs 1 6 1 6 1 1 1 1 1 1
Largest contig 4 648 264 3 621 566 4 639 815 2 212 780 4 651 567 4 654 368 4 647 818 4 649 190 4 648 720 4 648 103
Total length 4 648 264 4 640 025 4 639 815 4 649 418 4 651 567 4 654 368 4 647 818 4 649 190 4 648 720 4 648 103
N50 4 648 264 3 621 566 4 639 815 887 256 4 651 567 4 654 368 4 647 818 4 649 190 4 648 720 4 648 103
Misassemblies
# misassemblies 10 8 8 8 8 8 10 9 8 8
Misassembled contigs length 4 648 264 3 621 566 4 639 815 3 857 911 4 651 567 4 654 368 4 647 818 4 649 190 4 648 720 4 648 103
Mismatches
# mismatches per 100kbp 0.17 0.28 0.39 0.43 0.22 0.28 0.19 0.37 0.13 0.11
# indels per 100kbp 1.72 25.24 22.1 15.82 6.79 11.7 5.43 4.440 5.24 6.19
# N's per 100kbp 0 0 0 0 0 0 0 0 0 0
Genome Statistics
Genome fraction(%) 100 99.9 99.965 99.981 100 100 100 100 100 100
Duplication ratio 1.002 1.001 1 1.002 1.003 1.003 1.002 1.002 1.002 1.002
# genes 4494+3 part 4486+8 part 4491+5 part 4490+7 part 4494+3 part 4495+2 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part
NGA50 3 026 382 907 054 1 097 788 880 448 3 026 242 3 026 108 3 026 274 1 257 032 2 856 687 2 856 706
Running Time 1hr 45m 24m 2s 28m 45s 26m 23s 40m 12s 32m 31s 38m 38s 45m 55s 48m 40s 46m 47s

1.2

genomeSize=5580000

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set 8 SMRT cells : 1st Set 8 SMRT cells : 2nd Set 8 SMRT cells : 3rd Set
# contigs 1 4 1 6 1 1 1 1 1 1
Largest contig 4 649 007 3 622 415 4 639 778 3 100 853 4 651 559 4 652 494 4 647 813 4 649 253 4 648 694 4 648 107
Total length 4 649 007 4 638 349 4 639 778 4 660 255 4 651 559 4 652 494 4 647 813 4 649 253 4 648 694 4 648 107
N50 4 649 007 3 622 415 4 639 778 3 100 853 4 651 559 4 652 494 4 647 813 4 649 253 4 648 694 4 648 107
Misassemblies
# misassemblies 10 8 8 8 8 9 10 9 8 8
Misassembled contigs length 4 649 007 3 622 415 4 639 778 3 858 736 4 651 559 4 652 494 4 647 813 4 649 253 4 648 694 4 648 107
Mismatches
# mismatches per 100kbp 0.19 0.43 0.37 0.34 0.15 0.09 0.15 0.37 0.11 0.11
# indels per 100kbp 1.51 25.84 22.42 15.88 6.4 10.35 5.73 3.66 5.22 6.06
# N's per 100kbp 0 0 0 0 0 0 0 0 0 0
Genome Statistics
Genome fraction(%) 100 99.937 99.965 100 100 100 100 100 100 100
Duplication ratio 1.002 1 1 1.004 1.003 1.003 1.002 1.002 1.002 1.002
# genes 4494+3 part 4488+8 part 4491+5 part 4492+5 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part 4494+3 part
NGA50 3 026 391 907 173 1 097 770 880 460 3 026 223 1 252 894 3 026 257 1 257 068 2 856 688 2 856 692
Running Time
Running Time 2hr 9m 28m 59s 33m 32s 32m 21s 50m 17s 38m 36s 49m 13s 57m 6s 1hr 24s 59m 7s


Dataset 6 (E.coli K-12 MG1655, 8 SMRT cells)

We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list. (more detail)

Performance

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set
# contigs 2 8 10 14 1 1 4
Largest contig 4 278 957 2 277 010 1 213 670 984 459 4 641 350 4 640 250 3 162 440
Total length 4 650 771 4 648 304 4 644 602 4 656 274 4 641 350 4 640 250 4 653 394
N50 4 278 957 622 425 800 993 565 251 4 641 350 4 640 250 3 162 440
Misassemblies
# misassemblies 9 9 9 8 8 8 9
Misassembled contigs length 4 278 957 2 809 129 2 085 482 1 947 163 4 641 350 4 640 250 3 209 090
Mismatches
# mismatches per 100kbp 0.37 2.49 1.88 5.38 0.69 0.67 0.86
# indels per 100kbp 3.58 53.34 45.82 73.07 10.65 11.28 10.46
# N's per 100kbp 0 0.04 0.02 0.09 0 0 0
Genome Statistics
Genome fraction(%) 99.993 99.733 99.67 99.693 99.972 99.946 99.968
Duplication ratio 1.002 1.005 1.006 1.007 1.001 1.001 1.003
# genes 4492+5 part 4475+10 part 4467+12 part 4469+13 part 4492+4 part 4491+4 part 4492+4 part
NGA50 859 464 621 281 572 455 436 292 1 098 529 1 096 784 859 502
Running Time
pacBioToCA 20hr 03m 5hr 52m 6hr 05m 5hr 19m 15hr 53m 14hr 47m 15hr 38m
runCA 15hr 41m 7hr 32m 7hr 10m 5hr 42m 15hr 44m 16hr 02m 13hr 27m
Total 35hr 44m 13hr 24m 13hr 15m 11hr 01m 31hr 37m 30hr 49m 29hr 05m

Misassemblies for Adobe reader.

Dataset 7, (M. ruber DSM1279, 4 SMRT cells)

We used all SMRT cells to do assembly and evaluated the assemblies by QUAST against the reference genome (NC_013946) and Mr_gene_list. (more detail)

Performance

Statistics without reference All Data
# contigs 2
Largest contig 2 974 307
Total length 3 100 289
N50 2 974 307
Misassemblies
# misassemblies 3
Misassembled contigs length 2 974 307
Mismatches
# mismatches per 100kbp 0.23
# indels per 100kbp 5.01
# N's per 100kbp 0.03
Genome Statistics
Genome fraction(%) 99.883
Duplication ratio 1.002
# genes 3093+4 part
NGA50 1 707 938
Running Time
pacBioToCA 7hr 35m
runCA 8hr 7m
Total 15hr 42m

Dataset 8 (P. heparinus DSM1279, 7 SMRT cells)

We used all SMRT cells and randomly selected four SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_013061) and Ph_gene_list. (more detail)

Performance

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set
# contigs 1 3 3 3
Largest contig 5 163 983 2 232 679 2 236 613 2 237 949
Total length 5 163 983 5 161 276 5 165 518 5 166 563
N50 5 163 983 2 043 590 2 044 147 2 135 225
Misassemblies
# misassemblies 0 0 0 0
Misassembled contigs length 0 0 0 0
Mismatches
# mismatches per 100kbp 8.41 9.960 8.27 10.29
# indels per 100kbp 2.19 18.99 13.13 14.01
# N's per 100kbp 0 0 0 0
Genome Statistics
Genome fraction(%) 99.919 99.864 99.907 99.89
Duplication ratio 1 1 1.001 1.001
# genes 4335+3 part 4330+5 part 4333+5 part 4333+3 part
NGA50 5 163 983 532 2 043 590 2 044 147 2 135 225
Running Time
pacBioToCA 18hr 55m 6hr 27m 6hr 34m 6hr 31m
runCA 21hr 36m 11hr 39m 12hr 26m 12hr 12m
Total 40hr 31m 18hr 06m 19hr 00n 18hr 43m

Misassemblies for Adobe reader.

Dataset 9 (E. coli K-12, P4-C2 chemistry, 20 Kbp, 1 SMRT cell)

We used all SMRT cells and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list. (more detail)

Performance

Statistics without reference All Data
# contigs 1
Largest contig 4 656257
Total length 4 656 257
N50 4 656 257
Misassemblies
# misassemblies 8
Misassembled contigs length 4 656 257
Mismatches
# mismatches per 100kbp 0.22
# indels per 100kbp 13.15
# N's per 100kbp 0
Genome Statistics
Genome fraction(%) 100
Duplication ratio 1.004
# genes 4494+3 part
NGA50 3 026 094
Running Time
PacBioToCA 13hr 01m
runCA 17hr 58m
Total 30hr 59m


We used the latest version of PBcR pipeline (8.2beta). more detail

PBcR -pbCNS -length 500 -partitions 200 -l p4c2 -s pacbio.spec -fastq filtered_subreads.fastq genomeSize=4650000

wgs-8.2beta Performance

Statistics without reference All Data
# contigs 2
Largest contig 4 644 060
Total length 4 652 830
N50 4 644 060
Misassemblies
# misassemblies 8
Misassembled contigs length 4 644 060
Mismatches
# mismatches per 100kbp 0.17
# indels per 100kbp 31.7
# N's per 100kbp 0
Genome Statistics
Genome fraction(%) 100
Duplication ratio 1.003
# genes 4494+3 part
NGA50 3 025 484
Running time
Running time 23m