Hierarchical Genome Assembly Process (HGAP) was proposed in the ref (Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Meth 2013).
Contents |
---|
We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 16 | 10 | 14 | 16 | 9 | 18 | 13 |
Largest contig | 2 198 457 | 3 4848 77 | 1 936 831 | 1 948 632 | 2 104 087 | 1 169 224 | 1 439 551 |
Total length | 4 808 733 | 4 706 800 | 4 705 398 | 4 745 036 | 4 741 512 | 4 814 718 | 4 749 785 |
N50 | 1 005 770 | 3 484 877 | 966 809 | 1 434 284 | 1 655 500 | 676 526 | 1 268 010 |
Misassemblies | |||||||
# misassemblies | 19 | 9 | 12 | 15 | 14 | 17 | 11 |
Misassembled contigs length | 2 939 040 | 3 530 352 | 2 949 761 | 3 653 461 | 3 820 624 | 2 387 129 | 3 986 402 |
Mismatches | |||||||
# mismatches per 100kbp | 0.8 | 0.43 | 0.58 | 1.36 | 0.15 | 0.95 | 0.58 |
# indels per 100kbp | 5.71 | 2.98 | 4.45 | 9.56 | 1.77 | 8.02 | 6.88 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 100 | 100 | 99.815 | 99.87 | 100 | 99.995 | 99.979 |
Duplication ratio | 1.037 | 1.016 | 1.017 | 1.025 | 1.022 | 1.038 | 1.025 |
# genes | 4494+3 part | 4494+3 part | 4480+7 part | 4485+9 part | 4494+3 part | 4493+4 part | 4492+5 part |
NGA50 | 615 234 | 1 205 052 | 572 342 | 875 953 | 844 482 | 633 220 | 1 267 242 |
Running Time | 19hr 06m | 13hr 34m | 13hr 21m | 12hr 38m |
We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 7 | 8 | 10 | 12 | 4 | 9 | 12 |
Largest contig | 2 198 457 | 3 4848 77 | 1 936 831 | 1 948 632 | 2 104 087 | 1 169 224 | 1 439 551 |
Total length | 4 706 061 | 4 674 582 | 4 659 277 | 4 682 754 | 4 680 475 | 4 702 993 | 4 739 366 |
N50 | 1 005 770 | 3 484 877 | 966 809 | 1 434 284 | 1 655 500 | 676 526 | 1 268 010 |
Misassemblies | |||||||
# misassemblies | 10 | 7 | 8 | 9 | 9 | 8 | 10 |
Misassembled contigs length | 2 836 368 | 3 498 134 | 2 903 640 | 3 591 179 | 3 759 587 | 2 275 404 | 3 975 983 |
Mismatches | |||||||
# mismatches per 100kbp | 0.8 | 0.43 | 0.45 | 1.27 | 0.15 | 0.75 | 0.58 |
# indels per 100kbp | 5.71 | 2.98 | 3.56 | 8.72 | 1.77 | 6.06 | 6.88 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 100 | 100 | 99.798 | 99.87 | 100 | 99.995 | 99.979 |
Duplication ratio | 1.014 | 1.009 | 1.006 | 1.012 | 1.009 | 1.014 | 1.023 |
# genes | 4494+3 part | 4494+3 part | 4479+8 part | 4485+9 part | 4494+3 part | 4493+4 part | 4492+5 part |
NGA50 | 615 234 | 1 205 052 | 572 342 | 875 953 | 844 482 | 633 220 | 1 267 242 |
After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends.
Statistics without reference | All Data | 4 SMRT cells : 1st Set | 4 SMRT cells : 2nd Set | 4 SMRT cells : 3rd Set | 6 SMRT cells : 1st Set | 6 SMRT cells : 2nd Set | 6 SMRT cells : 3rd Set |
# contigs | 7 | 8 | 10 | 12 | 4 | 9 | 12 |
Largest contig | 2 196 495 | 3 478 799 | 1 936 007 | 1 948 495 | 2 100 388 | 1 165 497 | 1 438 506 |
Total length | 4 694 972 | 4 662 655 | 4 649 216 | 4 657 587 | 4 668 899 | 4 681 301 | 4 714 790 |
N50 | 1 005 009 | 3 478 799 | 964 998 | 1 433 016 | 1 654 501 | 375 502 | 1 266 511 |
Misassemblies | |||||||
# misassemblies | 9 | 9 | 7 | 8 | 9 | 7 | 8 |
Misassembled contigs length | 2 210 994 | 3 490 490 | 2 901 005 | 3 496 520 | 3 754 889 | 2 256 498 | 3 197 010 |
Mismatches | |||||||
# mismatches per 100kbp | 0.63 | 0.28 | 0.22 | 0.91 | 0.15 | 0.54 | 0.47 |
# indels per 100kbp | 5.02 | 2.55 | 1.84 | 7.08 | 1.68 | 4.91 | 6.12 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 100 | 99.842 | 99.776 | 99.889 | 100 | 99.985 | 99.979 |
Duplication ratio | 1.012 | 1.008 | 1.005 | 1.006 | 1.006 | 1.009 | 1.018 |
# genes | 4494+3 part | 4485+6 part | 4478+9 part | 4482+11 part | 4494+3 part | 4493+4 part | 4492+5 part |
NGA50 | 614 657 | 1 088 544 | 572 342 | 875 453 | 843 983 | 632 720 | 1 265 743 |
We used all SMRT cells to do assembly and access the correctness by Quast.
Statistics without reference | All Data | ||||||
# contigs | 16 | ||||||
Largest contig | 2 198 457 | 3 4848 77 | 1 936 831 | 1 948 632 | 2 104 087 | 1 169 224 | 1 439 551 |
Total length | 4 808 733 | 4 706 800 | 4 705 398 | 4 745 036 | 4 741 512 | 4 814 718 | 4 749 785 |
N50 | 1 005 770 | 3 484 877 | 966 809 | 1 434 284 | 1 655 500 | 676 526 | 1 268 010 |
Misassemblies | |||||||
# misassemblies | 19 | 9 | 12 | 15 | 14 | 17 | 11 |
Misassembled contigs length | 2 939 040 | 3 530 352 | 2 949 761 | 3 653 461 | 3 820 624 | 2 387 129 | 3 986 402 |
Mismatches | |||||||
# mismatches per 100kbp | 0.8 | 0.43 | 0.58 | 1.36 | 0.15 | 0.95 | 0.58 |
# indels per 100kbp | 5.71 | 2.98 | 4.45 | 9.56 | 1.77 | 8.02 | 6.88 |
# N's per 100kbp | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Genome Statistics | |||||||
Genome fraction(%) | 100 | 100 | 99.815 | 99.87 | 100 | 99.995 | 99.979 |
Duplication ratio | 1.037 | 1.016 | 1.017 | 1.025 | 1.022 | 1.038 | 1.025 |
# genes | 4494+3 part | 4494+3 part | 4480+7 part | 4485+9 part | 4494+3 part | 4493+4 part | 4492+5 part |
NGA50 | 615 234 | 1 205 052 | 572 342 | 875 953 | 844 482 | 633 220 | 1 267 242 |
Running Time | 19hr 06m | 13hr 34m | 13hr 21m | 12hr 38m |