Hierarchical Genome Assembly Process (HGAP) was proposed in the ref (Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Meth 2013).
Contents |
---|
We randomly selected four, six and eight SMRT cells three times for each, and access the correctness by Quast.
Statistics without reference | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set | 8 SMRT cells : 1st Set | 8 SMRT cells : 2nd Set | 8 SMRT cells : 3rd Set |
# contigs | 5 | 10 | 4 | 11 | 7 | 8 | 6 | 10 | 5 |
Largest contig | 3 770 578 | 4 106 852 | 4 644 754 | 3 785 116 | 4 647 724 | 3 287 965 | 4 649 322 | 4 623 068 | 4 649 308 |
Total length | 4 684 069 | 4 723 363 | 4 671 153 | 4 736 342 | 4 711 060 | 4 708 831 | 4 706 433 | 4 731 334 | 4 691 736 |
N50 | 3 770 578 | 4 106 852 | 4 644 754 | 3 785 116 | 4 647 724 | 3 287 965 | 4 649 322 | 4 623 068 | 4 649 308 |
Misassemblies | |||||||||
# misassemblies | 10 | 13 | 13 | 15 | 12 | 11 | 11 | 16 | 12 |
Misassembled contigs length | 3 788 648 | 4 700 016 | 4 671 153 | 4 726 005 | 4 685 712 | 3 339 030 | 4 694 303 | 4 698 068 | 4 649 308 |
Mismatches | |||||||||
# mismatches per 100kbp | 0.47 | 0.56 | 0.37 | 0.19 | 0.11 | 0.15 | 0.13 | 0.43 | 0.17 |
# indels per 100kbp | 1.08 | 4.44 | 0.22 | 1.66 | 0.63 | 0.65 | 0.19 | 4.59 | 0.56 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||||
Genome fraction(%) | 100 | 100 | 99.994 | 99.999 | 100 | 100 | 100 | 99.99 | 100 |
Duplication ratio | 1.01 | 1.018 | 1.007 | 1.021 | 1.031 | 1.015 | 1.012 | 1.02 | 1.011 |
# genes | 4495+2 part | 4495+2 part | 4493+3 part | 4494+3 part | 4495+2 part | 4495+2 part | 4495+2 part | 4494+3 part | 4495+2 part |
NGA50 | 1 207 217 | 2 558 505 | 1 640 882 | 2 888 022 | 2 834 458 | 1 298 912 | 1 477 605 | 1 344 200 | 2 995 586 |
Running Time | ?hr ?m | ?hr ?m | ?hr ?m | 21hr 05m | 19hr 32m | 21hr 01m | 26hr 46m | |27hr 52m | 26hr 13m |
We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.
Statistics without reference | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set | 8 SMRT cells : 1st Set | 8 SMRT cells : 2nd Set | 8 SMRT cells : 3rd Set |
# contigs | 2 | 6 | 1 | 5 | 2 | 4 | 2 | 3 | 2 |
Largest contig | 3 770 578 | 4 106 852 | 4 644 754 | 3 785 116 | 4 647 724 | 3 287 965 | 4 649 322 | 4 623 068 | 4 649 308 |
Total length | 4 651 736 | 4 691 077 | 4 644 754 | 4 675 943 | 4 660 074 | 4 671 197 | 4 664 502 | 4 661 980 | 4 661 084 |
N50 | 3 770 578 | 4 106 852 | 4 644 754 | 3 785 116 | 4 647 724 | 3 287 965 | 4 649 322 | 4 623 068 | 4 649 308 |
Misassemblies | |||||||||
# misassemblies | 8 | 10 | 10 | 10 | 8 | 7 | 8 | 9 | 9 |
Misassembled contigs length | 3 770 578 | 4 677 561 | 4 644 754 | 4 675 943 | 4 647 724 | 3 301 396 | 4 664 502 | 4 639 404 | 4 649 308 |
Mismatches | |||||||||
# mismatches per 100kbp | 0.15 | 0.5 | 0.37 | 0.22 | 0.11 | 0.15 | 0.13 | 0.22 | 0.17 |
# indels per 100kbp | 0.47 | 3.34 | 0.22 | 1.47 | 0.63 | 0.65 | 0.19 | 1.44 | 0.56 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||||
Genome fraction(%) | 100 | 100 | 99.994 | 99.999 | 100 | 100 | 100 | 99.99 | 100 |
Duplication ratio | 1.003 | 1.011 | 1.002 | 1.008 | 1.005 | 1.007 | 1.005 | 1.005 | 1.005 |
# genes | 4494+3 part | 4495+2 part | 4493+3 part | 4493+4 part | 4495+2 part | 4495+2 part | 4495+2 part | 4493+4 part | 4495+2 part |
NGA50 | 1 207 217 | 2 558 505 | 1 640 882 | 2 888 022 | 2 834 458 | 1 298 912 | 1 477 605 | 1 344 200 | 2 995 586 |
After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends.
Statistics without reference | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set | 8 SMRT cells : 1st Set | 8 SMRT cells : 2nd Set | 8 SMRT cells : 3rd Set |
# contigs | 2 | 6 | 1 | 4 | 2 | 4 | 2 | 3 | 2 |
Largest contig | 3 768 995 | 4 105 501 | 4 644 254 | 3 784 001 | 4 646 000 | 3 287 004 | 4 646 998 | 4 622 502 | 4 647 000 |
Total length | 4 649 500 | 4 678 503 | 4 644 254 | 4 660 999 | 4 655 498 | 4 667 500 | 4 660 992 | 4 660 836 | 4 656 000 |
N50 | 3 768 995 | 4 105 501 | 4 644 254 | 3 784 001 | 4 646 000 | 3 287 004 | 4 646 998 | 4 622 502 | 4 647 000 |
Misassemblies | |||||||||
# misassemblies | 8 | 10 | 10 | 9 | 8 | 7 | 8 | 9 | 8 |
Misassembled contigs length | 3 768 995 | 4 666 999 | 4 644 254 | 4 660 999 | 4 646 000 | 3 299 005 | 4 660 992 | 4 638 338 | 4 647 000 |
Mismatches | |||||||||
# mismatches per 100kbp | 0.15 | 0.5 | 0.37 | 0.19 | 0.11 | 0.11 | 0.13 | 0.22 | 0.17 |
# indels per 100kbp | 0.37 | 2.93 | 0.22 | 1.44 | 0.54 | 0.58 | 0.19 | 1.34 | 0.47 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||||
Genome fraction(%) | 100 | 100 | 99.994 | 99.999 | 100 | 100 | 100 | 99.99 | 100 |
Duplication ratio | 1.002 | 1.008 | 1.002 | 1.005 | 1.004 | 1.006 | 1.005 | 1.005 | 1.004 |
# genes | 4494+3 part | 4494+3 part | 4493+3 part | 4493+4 part | 4495+2 part | 4495+2 part | 4495+2 part | 4493+4 part | 4495+2 part |
NGA50 | 1 207 217 | 2 558 154 | 1 640 382 | 2 888 022 | 2 833 234 | 1 298 912 | 1 476 281 | 1 344 200 | 2 995 586 |
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 16 | 10 | 14 | 16 | 9 | 18 | 13 |
Largest contig | 2 198 457 | 3 484 877 | 1 936 831 | 1 948 632 | 2 104 087 | 1 169 224 | 1 439 551 |
Total length | 4 808 733 | 4 706 800 | 4 705 398 | 4 745 036 | 4 741 512 | 4 814 718 | 4 749 785 |
N50 | 1 005 770 | 3 484 877 | 966 809 | 1 434 284 | 1 655 500 | 676 526 | 1 268 010 |
Misassemblies | |||||||
# misassemblies | 19 | 9 | 12 | 15 | 14 | 17 | 11 |
Misassembled contigs length | 2 939 040 | 3 530 352 | 2 949 761 | 3 653 461 | 3 820 624 | 2 387 129 | 3 986 402 |
Mismatches | |||||||
# mismatches per 100kbp | 0.8 | 0.43 | 0.58 | 1.36 | 0.15 | 0.95 | 0.58 |
# indels per 100kbp | 5.71 | 2.98 | 4.45 | 9.56 | 1.77 | 8.02 | 6.88 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 100 | 100 | 99.815 | 99.87 | 100 | 99.995 | 99.979 |
Duplication ratio | 1.037 | 1.016 | 1.017 | 1.025 | 1.022 | 1.038 | 1.025 |
# genes | 4494+3 part | 4494+3 part | 4480+7 part | 4485+9 part | 4494+3 part | 4493+4 part | 4492+5 part |
NGA50 | 615 234 | 1 205 052 | 572 342 | 875 953 | 844 482 | 633 220 | 1 267 242 |
Running Time | 19hr 06m | 13hr 34m | 13hr 21m | 12hr 38m | 21hr 28m | 22hr 56m | 22hr 07m |
We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 7 | 8 | 10 | 12 | 4 | 9 | 12 |
Largest contig | 2 198 457 | 3 4848 77 | 1 936 831 | 1 948 632 | 2 104 087 | 1 169 224 | 1 439 551 |
Total length | 4 706 061 | 4 674 582 | 4 659 277 | 4 682 754 | 4 680 475 | 4 702 993 | 4 739 366 |
N50 | 1 005 770 | 3 484 877 | 966 809 | 1 434 284 | 1 655 500 | 676 526 | 1 268 010 |
Misassemblies | |||||||
# misassemblies | 10 | 7 | 8 | 9 | 9 | 8 | 10 |
Misassembled contigs length | 2 836 368 | 3 498 134 | 2 903 640 | 3 591 179 | 3 759 587 | 2 275 404 | 3 975 983 |
Mismatches | |||||||
# mismatches per 100kbp | 0.8 | 0.43 | 0.45 | 1.27 | 0.15 | 0.75 | 0.58 |
# indels per 100kbp | 5.71 | 2.98 | 3.56 | 8.72 | 1.77 | 6.06 | 6.88 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 100 | 100 | 99.798 | 99.87 | 100 | 99.995 | 99.979 |
Duplication ratio | 1.014 | 1.009 | 1.006 | 1.012 | 1.009 | 1.014 | 1.023 |
# genes | 4494+3 part | 4494+3 part | 4479+8 part | 4485+9 part | 4494+3 part | 4493+4 part | 4492+5 part |
NGA50 | 615 234 | 1 205 052 | 572 342 | 875 953 | 844 482 | 633 220 | 1 267 242 |
After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 7 | 8 | 10 | 12 | 4 | 9 | 12 |
Largest contig | 2 196 495 | 3 478 799 | 1 936 007 | 1 948 495 | 2 100 388 | 1 165 497 | 1 438 506 |
Total length | 4 694 972 | 4 662 655 | 4 649 216 | 4 657 587 | 4 668 899 | 4 681 301 | 4 714 790 |
N50 | 1 005 009 | 3 478 799 | 964 998 | 1 433 016 | 1 654 501 | 375 502 | 1 266 511 |
Misassemblies | |||||||
# misassemblies | 9 | 9 | 7 | 8 | 9 | 7 | 8 |
Misassembled contigs length | 2 210 994 | 3 490 490 | 2 901 005 | 3 496 520 | 3 754 889 | 2 256 498 | 3 197 010 |
Mismatches | |||||||
# mismatches per 100kbp | 0.63 | 0.28 | 0.22 | 0.91 | 0.15 | 0.54 | 0.47 |
# indels per 100kbp | 5.02 | 2.55 | 1.84 | 7.08 | 1.68 | 4.91 | 6.12 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 100 | 99.842 | 99.776 | 99.889 | 100 | 99.985 | 99.979 |
Duplication ratio | 1.012 | 1.008 | 1.005 | 1.006 | 1.006 | 1.009 | 1.018 |
# genes | 4494+3 part | 4485+6 part | 4478+9 part | 4482+11 part | 4494+3 part | 4493+4 part | 4492+5 part |
NGA50 | 614 657 | 1 088 544 | 572 342 | 875 453 | 843 983 | 632 720 | 1 265 743 |
We used all SMRT cells to do assembly and access the correctness by Quast.
Statistics without reference | All Data |
# contigs | 3 |
Largest contig | 2 548 031 |
Total length | 3 121 070 |
N50 | 2 548 031 |
Misassemblies | |
# misassemblies | 1 |
Misassembled contigs length | 2 548 031 |
Mismatches | |
# mismatches per 100kbp | 0.52 |
# indels per 100kbp | 2.71 |
# N's per 100kbp | 0 |
Genome Statistics | |
Genome fraction(%) | 99.986 |
Duplication ratio | 1.017 |
# genes | 3103+2 part |
NGA50 | 1 155 126 |
Running Time | 18hr 19m |
We discarded low quality bases which present in lower-case from contigs two-side ends
Statistics without reference | All Data |
# contigs | 3 |
Largest contig | 2 545 501 |
Total length | 3 115 015 |
N50 | 2 545 501 |
Misassemblies | |
# misassemblies | 1 |
Misassembled contigs length | 2 545 501 |
Mismatches | |
# mismatches per 100kbp | 0.42 |
# indels per 100kbp | 2.52 |
# N's per 100kbp | 0 |
Genome Statistics | |
Genome fraction(%) | 99.986 |
Duplication ratio | 1.015 |
# genes | 3103+2 part |
NGA50 | 1 153 096 |
We used all SMRT cells and randomly selected four SMRT cells three times for each, and access the correctness by Quast.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set |
# contigs | 3 | 3 | 3 | 6 |
Largest contig | 2 934 267 | 2 927 454 | 2 929 942 | 2 226 051 |
Total length | 5 178 932 | 5 176 592 | 5 176 771 | 5 182 410 |
N50 | 2 934 267 | 2 927 454 | 2 929 942 | 2 133 457 |
Misassemblies | ||||
# misassemblies | 0 | 1 | 0 | 1 |
Misassembled contigs length | 0 | 2 240 169 | 0 | 13 124 |
Mismatches | ||||
# mismatches per 100kbp | 0 | 0.02 | 0.06 | 6.45 |
# indels per 100kbp | 1.05 | 0.54 | 0.6 | 1.88 |
# N's per 100kbp | 0 | 0 | 0 | 0 |
Genome Statistics | ||||
Genome fraction(%) | 100 | 100 | 100 | 99.936 |
Duplication ratio | 1.003 | 1.003 | 1.003 | 1.006 |
# genes | 4338+1 part | 4338+1 part | 4338+1 part | 4335+4 part |
NGA50 | 2 934 267 | 2 927 454 | 2 929 942 | 2 133 457 |
Running Time | 24hr 56m | 17hr 41m | 18hr 14m | 17hr 04m |
We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set |
# contigs | 3 | 2 | 2 | 5 |
Largest contig | 2 943 267 | 2 927 454 | 2 929 942 | 2 226 051 |
Total length | 5 178 932 | 5 167 623 | 5 167 190 | 5 172 946 |
N50 | 2 934 267 | 2 927 454 | 2 929 942 | 2 133 457 |
Misassemblies | ||||
# misassemblies | 0 | 1 | 0 | 1 |
Misassembled contigs length | 0 | 2 240 169 | 0 | 13 124 |
Mismatches | ||||
# mismatches per 100kbp | 0 | 0.04 | 0.08 | 6.45 |
# indels per 100kbp | 1.05 | 0.68 | 0.6 | 1.82 |
# N's per 100kbp | 0 | 0 | 0 | 0 |
Genome Statistics | ||||
Genome fraction(%) | 100 | 99.951 | 99.916 | 99.878 |
Duplication ratio | 1.003 | 1.002 | 1.002 | 1.004 |
# genes | 4338+1 part | 4336+2 part | 4335+3 part | 4333+5 part |
NGA50 | 2 934 267 | 2 927 454 | 2 929 942 | 2 133 457 |
After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set |
# contigs | 3 | 2 | 2 | 5 |
Largest contig | 2 932 503 | 2 925 498 | 2 925 998 | 2 225 051 |
Total length | 5 175 001 | 5 163 999 | 5 162 498 | 5 161 405 |
N50 | 2 932 503 | 2 925 498 | 2 925 998 | 2 131 500 |
Misassemblies | ||||
# misassemblies | 0 | 1 | 0 | 0 |
Misassembled contigs length | 0 | 2 238 501 | 0 | 0 |
Mismatches | ||||
# mismatches per 100kbp | 0.02 | 0.06 | 9.98 | 6.42 |
# indels per 100kbp | 0.77 | 0.56 | 0.85 | 1.45 |
# N's per 100kbp | 0 | 0 | 0 | 0 |
Genome Statistics | ||||
Genome fraction(%) | 100 | 99.931 | 99.869 | 99.782 |
Duplication ratio | 1.002 | 1.001 | 1.002 | 1.002 |
# genes | 4338+1 part | 4336+2 part | 4331+4 part | 4328+7 part |
NGA50 | 2 932 503 | 2 925 498 | 2 925 998 | 2 131 500 |
We used all SMRT cells and randomly selected four SMRT cells three times for each, and access the correctness by Quast
Statistics without reference | All Data |
# contigs | 2 |
Largest contig | 4 656 681 |
Total length | 4 672 546 |
N50 | 4 656 681 |
Misassemblies | |
# misassemblies | 9 |
Misassembled contigs length | 4 672 546 |
Mismatches | |
# mismatches per 100kbp | 0.15 |
# indels per 100kbp | 4.87 |
# N's per 100kbp | 0 |
Genome Statistics | |
Genome fraction(%) | 100 |
Duplication ratio | 1.007 |
# genes | 4494+3 part |
NGA50 | 2 995 500 |
Running Time | 16hr 40m |
We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.
Statistics without reference | All Data |
# contigs | 1 |
Largest contig | 4 656 681 |
Total length | 4 656 681 |
N50 | 4 656 681 |
Misassemblies | |
# misassemblies | 8 |
Misassembled contigs length | 4 656 681 |
Mismatches | |
# mismatches per 100kbp | 0.15 |
# indels per 100kbp | 4.87 |
# N's per 100kbp | 0 |
Genome Statistics | |
Genome fraction(%) | 100 |
Duplication ratio | 1.004 |
# genes | 4494+3 part |
NGA50 | 2 995 500 |
After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends.
Statistics without reference | All Data |
# contigs | 1 |
Largest contig | 4 654 377 |
Total length | 4 654 377 |
N50 | 4 654 377 |
Misassemblies | |
# misassemblies | 8 |
Misassembled contigs length | 4 654 377 |
Mismatches | |
# mismatches per 100kbp | 0.15 |
# indels per 100kbp | 4.87 |
# N's per 100kbp | 0 |
Genome Statistics | |
Genome fraction(%) | 100 |
Duplication ratio | 1.003 |
# genes | 4494+3 part |
NGA50 | 2 995 500 |