HGAP

Revision as of 9 October 2013 01:45 by admin (Comments | Contribs)

Hierarchical Genome Assembly Process (HGAP) was proposed in the ref (Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Meth 2013).

Contents

DataSet1

We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and access the correctness by Quast.

Randomly Selected Four SMRT cells

First Set

Random Get @m121024_100442_42178_c100389662550000001523034410251205_s1_p0
Random Get @m121024_122509_42178_c100389662550000001523034410251206_s1_p0
Random Get @m121023_202553_42178_c100389662550000001523034410251200_s1_p0
Random Get @m121024_010654_42178_c100389662550000001523034410251202_s1_p0

Second Set

Random Get @m121024_122509_42178_c100389662550000001523034410251206_s1_p0
Random Get @m121023_202553_42178_c100389662550000001523034410251200_s1_p0
Random Get @m121024_032737_42178_c100389662550000001523034410251203_s1_p0
Random Get @m121024_144608_42178_c100389662550000001523034410251207_s1_p0

Third Set

Random Get @m121024_032737_42178_c100389662550000001523034410251203_s1_p0
Random Get @m121024_010654_42178_c100389662550000001523034410251202_s1_p0
Random Get @m121023_224605_42178_c100389662550000001523034410251201_s1_p0
Random Get @m121024_074656_42178_c100389662550000001523034410251204_s1_p0

Randomly Selected Six SMRT cells

First Set

Random Get @m121024_100442_42178_c100389662550000001523034410251205_s1_p0
Random Get @m121023_224605_42178_c100389662550000001523034410251201_s1_p0
Random Get @m121023_202553_42178_c100389662550000001523034410251200_s1_p0
Random Get @m121024_032737_42178_c100389662550000001523034410251203_s1_p0
Random Get @m121024_074656_42178_c100389662550000001523034410251204_s1_p0
Random Get @m121024_144608_42178_c100389662550000001523034410251207_s1_p0

Second Set

Random Get @m121024_074656_42178_c100389662550000001523034410251204_s1_p0
Random Get @m121023_224605_42178_c100389662550000001523034410251201_s1_p0
Random Get @m121024_032737_42178_c100389662550000001523034410251203_s1_p0
Random Get @m121024_144608_42178_c100389662550000001523034410251207_s1_p0
Random Get @m121024_010654_42178_c100389662550000001523034410251202_s1_p0
Random Get @m121024_100442_42178_c100389662550000001523034410251205_s1_p0

Third Set

Random Get @m121023_224605_42178_c100389662550000001523034410251201_s1_p0
Random Get @m121023_202553_42178_c100389662550000001523034410251200_s1_p0
Random Get @m121024_032737_42178_c100389662550000001523034410251203_s1_p0
Random Get @m121024_122509_42178_c100389662550000001523034410251206_s1_p0
Random Get @m121024_010654_42178_c100389662550000001523034410251202_s1_p0
Random Get @m121024_144608_42178_c100389662550000001523034410251207_s1_p0


Performance

Statistics without reference All Data 4 SMRT cells : 1st Set 4 SMRT cells : 2nd Set 4 SMRT cells : 3rd Set 6 SMRT cells : 1st Set 6 SMRT cells : 2nd Set 6 SMRT cells : 3rd Set
# contigs 16 10 14 16 9 18 13
Largest contig 2 198 457 3 4848 77 1 936 831 1 948 632
Total length 4 808 733 4 706 800 4 705 398 4 745 036
N50 1 005 770 3 484 877 966 809 1 434 284
Misassemblies
# misassemblies 19 9 12 15
Misassembled contigs length 2 939 040 3 530 352 2 949 761 3 653 461
Misassemblies
# mismatches per 100kbp 0.8 0.43 0.58 1.36
# indels per 100kbp 5.71 2.98 4.45 9.56
# N's per 100kbp 0 0 0 0
Genome Statistics
Genome fraction(%) 100 100 99.815 99.87
Duplication ratio 1.037 1.016 1.017 1.025
# genes 4494+3 part 4494+3 part 4480+7part 4485+9 part
NGA50 depth: 12.76X depth: 28.60X depth: 46.03X depth: 56.52X
Running Time






Discard Unconvincing Contigs

Discard Lower-case bases

DataSet2

Discard Contigs

Discard Lower-case bases

DataSet3

Discard Contigs

Discard Lower-case bases