Version Differences for HGAP

(Dataset 8 P. heparinus DSM2366, 7 SMRT cells)
Line 1:
  Hierarchical Genome Assembly Process (HGAP) was proposed in the [http://www.ncbi.nlm.nih.gov/pubmed/23644548 ref] (''Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.'' Nat Meth 2013).     Hierarchical Genome Assembly Process (HGAP) was proposed in the [http://www.ncbi.nlm.nih.gov/pubmed/23644548 ref] (''Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.'' Nat Meth 2013).  
       
       
    + =DataSet4=  
    + We randomly selected four, six and eight SMRT cells three times for each, and access the correctness by Quast.  
       
    + ==Performance==  
    + {| {{table}} border="1"  
    + | align="center" style="background:#f0f0f0;"|'''Statistics without reference '''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 3rd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 3rd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 3rd Set'''  
    + |-  
    + |# contigs||5||10||4||11||7||8||6||10||5  
    + |-  
    + |Largest contig||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308  
    + |-  
    + |Total length||4 684 069||4 723 363||4 671 153||4 736 342||4 711 060||4 708 831||4 706 433||4 731 334||4 691 736  
    + |-  
    + |N50||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308  
    + |-  
    + | style="background:#f0f0f0;"| '''Misassemblies'''||||||||||||||||||  
    + |-  
    + |# misassemblies||10||13||13||15||12||11||11||16||12  
    + |-  
    + |Misassembled contigs length ||3 788 648||4 700 016||4 671 153||4 726 005||4 685 712||3 339 030||4 694 303||4 698 068||4 649 308  
    + |-  
    + | style="background:#f0f0f0;"| '''Mismatches'''||||||||||||||||||  
    + |-  
    + |# mismatches per 100kbp||0.47||0.56||0.37||0.19||0.11||0.15||0.13||0.43||0.17  
    + |-  
    + |# indels per 100kbp||1.08||4.44||0.22||1.66||0.63||0.65||0.19||4.59||0.56  
    + |-  
    + |# N's per 100kbp ||0||0||0||0||0||0||0||0||0  
    + |-  
    + | style="background:#f0f0f0;"| '''Genome Statistics'''||||||||||||||||||  
    + |-  
    + |Genome fraction(%) ||100||100||99.994||99.999||100||100||100||99.99||100  
    + |-  
    + |Duplication ratio ||1.01||1.018||1.007||1.021||1.031||1.015||1.012||1.02||1.011  
    + |-  
    + |# genes ||4495+2 part||4495+2 part||4493+3 part||4494+3 part||4495+2 part||4495+2 part||4495+2 part||4494+3 part||4495+2 part  
    + |-  
    + |NGA50 ||1 207 217||2 558 505||1 640 882||2 888 022||2 834 458||1 298 912||1 477 605||1 344 200||2 995 586  
    + |-  
    + |'''Running Time'''||?hr ?m||?hr ?m||?hr ?m||21hr 05m||19hr 32m||21hr 01m||26hr 46m|||27hr 52m||26hr 13m  
    + |-  
    + |}  
       
       
       
    + ==Discard Unconvincing Contigs==  
    + We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.  
    + ==Performance==  
    + {| {{table}} border="1"  
    + | align="center" style="background:#f0f0f0;"|'''Statistics without reference '''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 3rd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 3rd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 3rd Set'''  
    + |-  
    + |# contigs||2||6||1||5||2||4||2||3||2  
    + |-  
    + |Largest contig||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308  
    + |-  
    + |Total length||4 651 736||4 691 077||4 644 754||4 675 943||4 660 074||4 671 197||4 664 502||4 661 980||4 661 084  
    + |-  
    + |N50||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308  
    + |-  
    + | style="background:#f0f0f0;"| '''Misassemblies'''||||||||||||||||||  
    + |-  
    + |# misassemblies||8||10||10||10||8||7||8||9||9  
    + |-  
    + |Misassembled contigs length ||3 770 578||4 677 561||4 644 754||4 675 943||4 647 724||3 301 396||4 664 502||4 639 404||4 649 308  
    + |-  
    + | style="background:#f0f0f0;"| '''Mismatches'''||||||||||||||||||  
    + |-  
    + |# mismatches per 100kbp||0.15||0.5||0.37||0.22||0.11||0.15||0.13||0.22||0.17  
    + |-  
    + |# indels per 100kbp||0.47||3.34||0.22||1.47||0.63||0.65||0.19||1.44||0.56  
    + |-  
    + |# N's per 100kbp ||0||0||0||0||0||0||0||0||0  
    + |-  
    + | style="background:#f0f0f0;"| '''Genome Statistics'''||||||||||||||||||  
    + |-  
    + |Genome fraction(%) ||100||100||99.994||99.999||100||100||100||99.99||100  
    + |-  
    + |Duplication ratio ||1.003||1.011||1.002||1.008||1.005||1.007||1.005||1.005||1.005  
    + |-  
    + |# genes ||4494+3 part||4495+2 part||4493+3 part||4493+4 part||4495+2 part||4495+2 part||4495+2 part||4493+4 part||4495+2 part  
    + |-  
    + |NGA50 ||1 207 217||2 558 505||1 640 882||2 888 022||2 834 458||1 298 912||1 477 605||1 344 200||2 995 586  
    + |-  
    + |}  
       
       
    + ==Discard Lower-case bases ==  
    + After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends.  
       
    + ==Performance==  
    + {| {{table}} border="1"  
    + | align="center" style="background:#f0f0f0;"|'''Statistics without reference '''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 3rd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 3rd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 1st Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 2nd Set'''  
    + | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 3rd Set'''  
    + |-  
    + |# contigs||2||6||1||4||2||4||2||3||2  
    + |-  
    + |Largest contig||3 768 995||4 105 501||4 644 254||3 784 001||4 646 000||3 287 004||4 646 998||4 622 502||4 647 000  
    + |-  
    + |Total length||4 649 500||4 678 503||4 644 254||4 660 999||4 655 498||4 667 500||4 660 992||4 660 836||4 656 000  
    + |-  
    + |N50||3 768 995||4 105 501||4 644 254||3 784 001||4 646 000||3 287 004||4 646 998||4 622 502||4 647 000  
    + |-  
    + | style="background:#f0f0f0;"| '''Misassemblies'''||||||||||||||||||  
    + |-  
    + |# misassemblies||8||10||10||9||8||7||8||9||8  
    + |-  
    + |Misassembled contigs length ||3 768 995||4 666 999||4 644 254||4 660 999||4 646 000||3 299 005||4 660 992||4 638 338||4 647 000  
    + |-  
    + | style="background:#f0f0f0;"| '''Mismatches'''||||||||||||||||||  
    + |-  
    + |# mismatches per 100kbp||0.15||0.5||0.37||0.19||0.11||0.11||0.13||0.22||0.17  
    + |-  
    + |# indels per 100kbp||0.37||2.93||0.22||1.44||0.54||0.58||0.19||1.34||0.47  
    + |-  
    + |# N's per 100kbp ||0||0||0||0||0||0||0||0||0  
    + |-  
    + | style="background:#f0f0f0;"| '''Genome Statistics'''||||||||||||||||||  
    + |-  
    + |Genome fraction(%) ||100||100||99.994||99.999||100||100||100||99.99||100  
    + |-  
    + |Duplication ratio ||1.002||1.008||1.002||1.005||1.004||1.006||1.005||1.005||1.004  
    + |-  
    + |# genes ||4494+3 part||4494+3 part||4493+3 part||4493+4 part||4495+2 part||4495+2 part||4495+2 part||4493+4 part||4495+2 part  
    + |-  
    + |NGA50 ||1 207 217||2 558 154||1 640 382||2 888 022||2 833 234||1 298 912||1 476 281||1 344 200||2 995 586  
    + |-  
    + |}  
       
       
  =Dataset 6 (''E. coli'' K-12 MG1655, 8 SMRT cells) =    =Dataset 6 (''E. coli'' K-12 MG1655, 8 SMRT cells) = 
  We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast.    We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast. 
Line 365:
  |-    |- 
  |NGA50 ||2 932 503||2 925 498||2 925 998||2 131 500    |NGA50 ||2 932 503||2 925 498||2 925 998||2 131 500 
- |-      
- |}      
       
       
       
       
- =DataSet4=      
- We randomly selected four, six and eight SMRT cells three times for each, and access the correctness by Quast.      
       
- ==Performance==      
- {| {{table}} border="1"      
- | align="center" style="background:#f0f0f0;"|'''Statistics without reference '''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 3rd Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 3rd Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 3rd Set'''      
- |-      
- |# contigs||5||10||4||11||7||8||6||10||5      
- |-      
- |Largest contig||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308      
- |-      
- |Total length||4 684 069||4 723 363||4 671 153||4 736 342||4 711 060||4 708 831||4 706 433||4 731 334||4 691 736      
- |-      
- |N50||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308      
- |-      
- | style="background:#f0f0f0;"| '''Misassemblies'''||||||||||||||||||      
- |-      
- |# misassemblies||10||13||13||15||12||11||11||16||12      
- |-      
- |Misassembled contigs length ||3 788 648||4 700 016||4 671 153||4 726 005||4 685 712||3 339 030||4 694 303||4 698 068||4 649 308      
- |-      
- | style="background:#f0f0f0;"| '''Mismatches'''||||||||||||||||||      
- |-      
- |# mismatches per 100kbp||0.47||0.56||0.37||0.19||0.11||0.15||0.13||0.43||0.17      
- |-      
- |# indels per 100kbp||1.08||4.44||0.22||1.66||0.63||0.65||0.19||4.59||0.56      
- |-      
- |# N's per 100kbp ||0||0||0||0||0||0||0||0||0      
- |-      
- | style="background:#f0f0f0;"| '''Genome Statistics'''||||||||||||||||||      
- |-      
- |Genome fraction(%) ||100||100||99.994||99.999||100||100||100||99.99||100      
- |-      
- |Duplication ratio ||1.01||1.018||1.007||1.021||1.031||1.015||1.012||1.02||1.011      
- |-      
- |# genes ||4495+2 part||4495+2 part||4493+3 part||4494+3 part||4495+2 part||4495+2 part||4495+2 part||4494+3 part||4495+2 part      
- |-      
- |NGA50 ||1 207 217||2 558 505||1 640 882||2 888 022||2 834 458||1 298 912||1 477 605||1 344 200||2 995 586      
- |-      
- |'''Running Time'''||?hr ?m||?hr ?m||?hr ?m||21hr 05m||19hr 32m||21hr 01m||26hr 46m|||27hr 52m||26hr 13m      
- |-      
- |}      
       
       
       
- ==Discard Unconvincing Contigs==      
- We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.      
- ==Performance==      
- {| {{table}} border="1"      
- | align="center" style="background:#f0f0f0;"|'''Statistics without reference '''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 3rd Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 3rd Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 3rd Set'''      
- |-      
- |# contigs||2||6||1||5||2||4||2||3||2      
- |-      
- |Largest contig||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308      
- |-      
- |Total length||4 651 736||4 691 077||4 644 754||4 675 943||4 660 074||4 671 197||4 664 502||4 661 980||4 661 084      
- |-      
- |N50||3 770 578||4 106 852||4 644 754||3 785 116||4 647 724||3 287 965||4 649 322||4 623 068||4 649 308      
- |-      
- | style="background:#f0f0f0;"| '''Misassemblies'''||||||||||||||||||      
- |-      
- |# misassemblies||8||10||10||10||8||7||8||9||9      
- |-      
- |Misassembled contigs length ||3 770 578||4 677 561||4 644 754||4 675 943||4 647 724||3 301 396||4 664 502||4 639 404||4 649 308      
- |-      
- | style="background:#f0f0f0;"| '''Mismatches'''||||||||||||||||||      
- |-      
- |# mismatches per 100kbp||0.15||0.5||0.37||0.22||0.11||0.15||0.13||0.22||0.17      
- |-      
- |# indels per 100kbp||0.47||3.34||0.22||1.47||0.63||0.65||0.19||1.44||0.56      
- |-      
- |# N's per 100kbp ||0||0||0||0||0||0||0||0||0      
- |-      
- | style="background:#f0f0f0;"| '''Genome Statistics'''||||||||||||||||||      
- |-      
- |Genome fraction(%) ||100||100||99.994||99.999||100||100||100||99.99||100      
- |-      
- |Duplication ratio ||1.003||1.011||1.002||1.008||1.005||1.007||1.005||1.005||1.005      
- |-      
- |# genes ||4494+3 part||4495+2 part||4493+3 part||4493+4 part||4495+2 part||4495+2 part||4495+2 part||4493+4 part||4495+2 part      
- |-      
- |NGA50 ||1 207 217||2 558 505||1 640 882||2 888 022||2 834 458||1 298 912||1 477 605||1 344 200||2 995 586      
- |-      
- |}      
       
       
- ==Discard Lower-case bases ==      
- After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends.      
       
- ==Performance==      
- {| {{table}} border="1"      
- | align="center" style="background:#f0f0f0;"|'''Statistics without reference '''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''4 SMRT cells : 3rd Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''6 SMRT cells : 3rd Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 1st Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 2nd Set'''      
- | align="center" style="background:#f0f0f0;"|'''8 SMRT cells : 3rd Set'''      
- |-      
- |# contigs||2||6||1||4||2||4||2||3||2      
- |-      
- |Largest contig||3 768 995||4 105 501||4 644 254||3 784 001||4 646 000||3 287 004||4 646 998||4 622 502||4 647 000      
- |-      
- |Total length||4 649 500||4 678 503||4 644 254||4 660 999||4 655 498||4 667 500||4 660 992||4 660 836||4 656 000      
- |-      
- |N50||3 768 995||4 105 501||4 644 254||3 784 001||4 646 000||3 287 004||4 646 998||4 622 502||4 647 000      
- |-      
- | style="background:#f0f0f0;"| '''Misassemblies'''||||||||||||||||||      
- |-      
- |# misassemblies||8||10||10||9||8||7||8||9||8      
- |-      
- |Misassembled contigs length ||3 768 995||4 666 999||4 644 254||4 660 999||4 646 000||3 299 005||4 660 992||4 638 338||4 647 000      
- |-      
- | style="background:#f0f0f0;"| '''Mismatches'''||||||||||||||||||      
- |-      
- |# mismatches per 100kbp||0.15||0.5||0.37||0.19||0.11||0.11||0.13||0.22||0.17      
- |-      
- |# indels per 100kbp||0.37||2.93||0.22||1.44||0.54||0.58||0.19||1.34||0.47      
- |-      
- |# N's per 100kbp ||0||0||0||0||0||0||0||0||0      
- |-      
- | style="background:#f0f0f0;"| '''Genome Statistics'''||||||||||||||||||      
- |-      
- |Genome fraction(%) ||100||100||99.994||99.999||100||100||100||99.99||100      
- |-      
- |Duplication ratio ||1.002||1.008||1.002||1.005||1.004||1.006||1.005||1.005||1.004      
- |-      
- |# genes ||4494+3 part||4494+3 part||4493+3 part||4493+4 part||4495+2 part||4495+2 part||4495+2 part||4493+4 part||4495+2 part      
- |-      
- |NGA50 ||1 207 217||2 558 154||1 640 382||2 888 022||2 833 234||1 298 912||1 476 281||1 344 200||2 995 586      
  |-    |- 
  |}    |}