HGAP

Revision as of 13 January 2015 20:01 by admin (Comments | Contribs) | (→Postprocess by discarding lower-case bases)

(diff) ← Previous revision | Current revision | Next revision → (diff)

Hierarchical Genome Assembly Process (HGAP) was proposed in the ref (Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Meth 2013).

Contents [hide]
1 Hierarchical genome-assembly process 2 Dataset 5 (E. coli K-12 MG1655, 17 SMRT cells) 2.1 Assembly 2.2 Postprocess by discarding unconvincing contigs 2.3 Postprocess by discarding lower-case bases 3 Dataset 6 (E. coli K-12 MG1655, 8 SMRT cells) 3.1 Assembly 3.2 Postprocess by discarding unconvincing contigs 3.3 Postprocess by discarding lower-case bases 4 Dataset 7 (M. ruber DSM1279, 4 SMRT cells) 4.1 Assembly 4.2 Postprocess by discarding lower-case bases 5 Dataset 8 (P. heparinus DSM2366, 7 SMRT cells) 5.1 Assembly 5.2 Postprocess by discarding unconvincing contigs 5.3 Postprocess by discarding lower-case bases 6 Dataset 9 (E. coli K-12, P4-C2 chemistry, 20 Kbp, 1 SMRT cell) 6.1 Assembly 6.2 Postprocess by discarding unconvincing contigs 6.3 Postprocess by discarding lower-case bases 7 HGAP 3.0 with Dataset 9 7.1 Assembly 7.2 Postprocess by discarding lower-case bases 7.3 Postprocess by discarding lower-case bases

Contents

1 Hierarchical genome-assembly process
2 Dataset 5 (E. coli K-12 MG1655, 17 SMRT cells)
3 Dataset 6 (E. coli K-12 MG1655, 8 SMRT cells)
4 Dataset 7 (M. ruber DSM1279, 4 SMRT cells)
- 4.1 Assembly
- 4.2 Postprocess by discarding lower-case bases
5 Dataset 8 (P. heparinus DSM2366, 7 SMRT cells)
6 Dataset 9 (E. coli K-12, P4-C2 chemistry, 20 Kbp, 1 SMRT cell)
7 HGAP 3.0 with Dataset 9

Hierarchical genome-assembly process

We downloaded smrtanalysis-2.0.1 from DevNet, you can run the RS_HGAP_Assembly.1 and RS_Modification_and_Motif_Analysis.1 protocols on SMRT Portal or execute by command line.

Prepare data for HGAP Protocol
1. Build input XML file (detail step please refer to the tutorial)
2. Build HGAP parameters XML file : HGAP2.0.xml. We used default parameters setting mostly, and set minSubReadLength = 50, readScore = 0.75, minLength = 50.
3. execute HGAP protocol.

smrtpipe.py --params=HGAP.xml xml:input.xml

Import reference
1. After execute HGAP Protocol, there will be generating a polished_assemble.fasta.gz in "data" folder. The file serves as a reference for mapping the single pass reads as specified by the original filter parameters to the draft assembly to generate a higher accurate consensus sequence via Quiver
2. Import the reference by SMRT portal.
3. SMRT protal will generate a reference folder under /opt/smrtanalysis/common/userdata.d/references/XXXXXX. You can copy the whole folder to your working directory, or asign the path in the Quiver.xml

Prepare for Quiver
1. Build Quiver parameters XML file : Quiver.xml. We set minSubReadLength = 50, readScore = 0.75, minLength = 50, and the others we used default value.
2. execute Quiver protocol.

smrtpipe.py --params=Quiver.xml xml:input.xml

Dataset 5 (E. coli K-12 MG1655, 17 SMRT cells)

We randomly selected four, six and eight SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list.

Assembly

Statistics without reference	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set	6 SMRT cells : 1st Set	6 SMRT cells : 2nd Set	6 SMRT cells : 3rd Set	8 SMRT cells : 1st Set	8 SMRT cells : 2nd Set	8 SMRT cells : 3rd Set
# contigs	5	10	4	11	7	8	6	10	5
Largest contig	3 770 578	4 106 852	4 644 754	3 785 116	4 647 724	3 287 965	4 649 322	4 623 068	4 649 308
Total length	4 684 069	4 723 363	4 671 153	4 736 342	4 711 060	4 708 831	4 706 433	4 731 334	4 691 736
N50	3 770 578	4 106 852	4 644 754	3 785 116	4 647 724	3 287 965	4 649 322	4 623 068	4 649 308
Misassemblies
# misassemblies	10	13	13	15	12	11	11	16	12
Misassembled contigs length	3 788 648	4 700 016	4 671 153	4 726 005	4 685 712	3 339 030	4 694 303	4 698 068	4 649 308
Mismatches
# mismatches per 100kbp	0.47	0.56	0.37	0.19	0.11	0.15	0.13	0.43	0.17
# indels per 100kbp	1.08	4.44	0.22	1.66	0.63	0.65	0.19	4.59	0.56
# N's per 100kbp	0	0	0	0	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	99.994	99.999	100	100	100	99.99	100
Duplication ratio	1.01	1.018	1.007	1.021	1.031	1.015	1.012	1.02	1.011
# genes	4495+2 part	4495+2 part	4493+3 part	4494+3 part	4495+2 part	4495+2 part	4495+2 part	4494+3 part	4495+2 part
NGA50	1 207 217	2 558 505	1 640 882	2 888 022	2 834 458	1 298 912	1 477 605	1 344 200	2 995 586
Running Time	?hr ?m	?hr ?m	?hr ?m	21hr 05m	19hr 32m	21hr 01m	26hr 46m	27hr 52m	26hr 13m

Postprocess by discarding unconvincing contigs

We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.

Statistics without reference	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set	6 SMRT cells : 1st Set	6 SMRT cells : 2nd Set	6 SMRT cells : 3rd Set	8 SMRT cells : 1st Set	8 SMRT cells : 2nd Set	8 SMRT cells : 3rd Set
# contigs	2	6	1	5	2	4	2	3	2
Largest contig	3 770 578	4 106 852	4 644 754	3 785 116	4 647 724	3 287 965	4 649 322	4 623 068	4 649 308
Total length	4 651 736	4 691 077	4 644 754	4 675 943	4 660 074	4 671 197	4 664 502	4 661 980	4 661 084
N50	3 770 578	4 106 852	4 644 754	3 785 116	4 647 724	3 287 965	4 649 322	4 623 068	4 649 308
Misassemblies
# misassemblies	8	10	10	10	8	7	8	9	9
Misassembled contigs length	3 770 578	4 677 561	4 644 754	4 675 943	4 647 724	3 301 396	4 664 502	4 639 404	4 649 308
Mismatches
# mismatches per 100kbp	0.15	0.5	0.37	0.22	0.11	0.15	0.13	0.22	0.17
# indels per 100kbp	0.47	3.34	0.22	1.47	0.63	0.65	0.19	1.44	0.56
# N's per 100kbp	0	0	0	0	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	99.994	99.999	100	100	100	99.99	100
Duplication ratio	1.003	1.011	1.002	1.008	1.005	1.007	1.005	1.005	1.005
# genes	4494+3 part	4495+2 part	4493+3 part	4493+4 part	4495+2 part	4495+2 part	4495+2 part	4493+4 part	4495+2 part
NGA50	1 207 217	2 558 505	1 640 882	2 888 022	2 834 458	1 298 912	1 477 605	1 344 200	2 995 586

Postprocess by discarding lower-case bases

After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends. more detail

Statistics without reference	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set	6 SMRT cells : 1st Set	6 SMRT cells : 2nd Set	6 SMRT cells : 3rd Set	8 SMRT cells : 1st Set	8 SMRT cells : 2nd Set	8 SMRT cells : 3rd Set
# contigs	2	6	1	4	2	4	2	3	2
Largest contig	3 768 995	4 105 501	4 644 254	3 784 001	4 646 000	3 287 004	4 646 998	4 622 502	4 647 000
Total length	4 649 500	4 678 503	4 644 254	4 660 999	4 655 498	4 667 500	4 660 992	4 660 836	4 656 000
N50	3 768 995	4 105 501	4 644 254	3 784 001	4 646 000	3 287 004	4 646 998	4 622 502	4 647 000
Misassemblies
# misassemblies	8	10	10	9	8	8	8	9	8
Misassembled contigs length	3 768 995	4 666 999	4 644 254	4 660 999	4 646 000	3 299 005	4 660 992	4 638 338	4 647 000
Mismatches
# mismatches per 100kbp	0.15	0.5	0.37	0.19	0.11	0.11	0.13	0.22	0.17
# indels per 100kbp	0.32	2.76	0.22	1.44	0.5	0.58	0.19	1.34	0.47
# N's per 100kbp	0	0	0	0	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	99.994	99.999	100	100	100	99.99	100
Duplication ratio	1.002	1.008	1.002	1.005	1.003	1.006	1.005	1.005	1.004
# genes	4494+3 part	4494+3 part	4493+3 part	4493+4 part	4495+2 part	4495+2 part	4495+2 part	4493+4 part	4495+2 part
NGA50	1 207 217	2 558 154	1 640 382	2 888 022	2 833 234	1 298 912	1 476 281	1 344 200	2 995 586

Misassemblies for Adobe reader.

Dataset 6 (E. coli K-12 MG1655, 8 SMRT cells)

We used all SMRT cells and randomly selected four and six SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list.

Assembly

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set	6 SMRT cells : 1st Set	6 SMRT cells : 2nd Set	6 SMRT cells : 3rd Set
# contigs	16	10	14	16	9	18	13
Largest contig	2 198 457	3 484 877	1 936 831	1 948 632	2 104 087	1 169 224	1 439 551
Total length	4 808 733	4 706 800	4 705 398	4 745 036	4 741 512	4 814 718	4 749 785
N50	1 005 770	3 484 877	966 809	1 434 284	1 655 500	676 526	1 268 010
Misassemblies
# misassemblies	19	9	12	15	14	17	11
Misassembled contigs length	2 939 040	3 530 352	2 949 761	3 653 461	3 820 624	2 387 129	3 986 402
Mismatches
# mismatches per 100kbp	0.8	0.43	0.58	1.36	0.15	0.95	0.58
# indels per 100kbp	5.71	2.98	4.45	9.56	1.77	8.02	6.88
# N's per 100kbp	0	0	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	99.815	99.87	100	99.995	99.979
Duplication ratio	1.037	1.016	1.017	1.025	1.022	1.038	1.025
# genes	4494+3 part	4494+3 part	4480+7 part	4485+9 part	4494+3 part	4493+4 part	4492+5 part
NGA50	615 234	1 205 052	572 342	875 953	844 482	633 220	1 267 242
Running Time	19hr 06m	13hr 34m	13hr 21m	12hr 38m	21hr 28m	22hr 56m	22hr 07m

Postprocess by discarding unconvincing contigs

We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set	6 SMRT cells : 1st Set	6 SMRT cells : 2nd Set	6 SMRT cells : 3rd Set
# contigs	7	8	10	12	4	9	12
Largest contig	2 198 457	3 4848 77	1 936 831	1 948 632	2 104 087	1 169 224	1 439 551
Total length	4 706 061	4 674 582	4 659 277	4 682 754	4 680 475	4 702 993	4 739 366
N50	1 005 770	3 484 877	966 809	1 434 284	1 655 500	676 526	1 268 010
Misassemblies
# misassemblies	10	7	8	9	9	8	10
Misassembled contigs length	2 836 368	3 498 134	2 903 640	3 591 179	3 759 587	2 275 404	3 975 983
Mismatches
# mismatches per 100kbp	0.8	0.43	0.45	1.27	0.15	0.75	0.58
# indels per 100kbp	5.71	2.98	3.56	8.72	1.77	6.06	6.88
# N's per 100kbp	0	0	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	99.798	99.87	100	99.995	99.979
Duplication ratio	1.014	1.009	1.006	1.012	1.009	1.014	1.023
# genes	4494+3 part	4494+3 part	4479+8 part	4485+9 part	4494+3 part	4493+4 part	4492+5 part
NGA50	615 234	1 205 052	572 342	875 953	844 482	633 220	1 267 242

Postprocess by discarding lower-case bases

After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends. more detail

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set	6 SMRT cells : 1st Set	6 SMRT cells : 2nd Set	6 SMRT cells : 3rd Set
# contigs	7	8	10	12	4	9	12
Largest contig	2 196 495	3 478 799	1 936 007	1 948 495	2 100 388	1 165 497	1 438 506
Total length	4 694 972	4 662 655	4 649 216	4 657 587	4 668 899	4 681 301	4 714 790
N50	1 005 009	3 478 799	964 998	1 433 016	1 654 501	375 502	1 266 511
Misassemblies
# misassemblies	9	9	8	9	10	9	10
Misassembled contigs length	2 210 994	3 490 490	2 901 005	3 496 520	3 754 889	2 256 498	3 197 010
Mismatches
# mismatches per 100kbp	0.63	0.28	0.22	0.91	0.15	0.54	0.47
# indels per 100kbp	5	2.5	1.84	6.8	1.64	4.63	5.99
# N's per 100kbp	0	0	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	99.842	99.776	99.859	100	99.985	99.979
Duplication ratio	1.012	1.007	1.005	1.006	1.006	1.009	1.016
# genes	4494+3 part	4485+6 part	4478+9 part	4482+11 part	4494+3 part	4493+4 part	4492+5 part
NGA50	614 657	949 284	432 003	853 140	747 216	579 994	672 148

Misassemblies for Adobe reader.

Dataset 7 (M. ruber DSM1279, 4 SMRT cells)

We used all SMRT cells to do assembly and evaluated the assemblies by QUAST against the reference genome (NC_013946)and Mr_gene_list.

Assembly

Statistics without reference	All Data
# contigs	3
Largest contig	2 548 031
Total length	3 121 070
N50	2 548 031
Misassemblies
# misassemblies	1
Misassembled contigs length	2 548 031
Mismatches
# mismatches per 100kbp	0.52
# indels per 100kbp	2.71
# N's per 100kbp	0
Genome Statistics
Genome fraction(%)	99.986
Duplication ratio	1.017
# genes	3103+2 part
NGA50	1 155 126
Running Time	18hr 19m

Postprocess by discarding lower-case bases

We discarded low quality bases which present in lower-case from contigs two-side ends. more detail

Statistics without reference	All Data
# contigs	3
Largest contig	2 545 501
Total length	3 115 015
N50	2 545 501
Misassemblies
# misassemblies	1
Misassembled contigs length	2 545 501
Mismatches
# mismatches per 100kbp	0.42
# indels per 100kbp	2.52
# N's per 100kbp	0
Genome Statistics
Genome fraction(%)	99.986
Duplication ratio	1.006
# genes	3103+2 part
NGA50	1 153 096

Dataset 8 (P. heparinus DSM2366, 7 SMRT cells)

We used all SMRT cells and randomly selected four SMRT cells three times for each, and evaluated the assemblies by QUAST against the reference genome (NC_013061) and Ph_gene_list

Assembly

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set
# contigs	3	3	3	6
Largest contig	2 934 267	2 927 454	2 929 942	2 226 051
Total length	5 178 932	5 176 592	5 176 771	5 182 410
N50	2 934 267	2 927 454	2 929 942	2 133 457
Misassemblies
# misassemblies	0	1	0	1
Misassembled contigs length	0	2 240 169	0	13 124
Mismatches
# mismatches per 100kbp	0	0.02	0.06	6.45
# indels per 100kbp	1.05	0.54	0.6	1.88
# N's per 100kbp	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	100	99.936
Duplication ratio	1.003	1.003	1.003	1.006
# genes	4338+1 part	4338+1 part	4338+1 part	4335+4 part
NGA50	2 934 267	2 927 454	2 929 942	2 133 457
Running Time	24hr 56m	17hr 41m	18hr 14m	17hr 04m

Postprocess by discarding unconvincing contigs

We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set
# contigs	3	2	2	5
Largest contig	2 943 267	2 927 454	2 929 942	2 226 051
Total length	5 178 932	5 167 623	5 167 190	5 172 946
N50	2 934 267	2 927 454	2 929 942	2 133 457
Misassemblies
# misassemblies	0	1	0	1
Misassembled contigs length	0	2 240 169	0	13 124
Mismatches
# mismatches per 100kbp	0	0.04	0.08	6.45
# indels per 100kbp	1.05	0.68	0.6	1.82
# N's per 100kbp	0	0	0	0
Genome Statistics
Genome fraction(%)	100	99.951	99.916	99.878
Duplication ratio	1.003	1.002	1.002	1.004
# genes	4338+1 part	4336+2 part	4335+3 part	4333+5 part
NGA50	2 934 267	2 927 454	2 929 942	2 133 457

Postprocess by discarding lower-case bases

After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends. more detail

Statistics without reference	All Data	4 SMRT cells : 1st Set	4 SMRT cells : 2nd Set	4 SMRT cells : 3rd Set
# contigs	3	2	2	5
Largest contig	2 932 503	2 925 498	2 925 998	2 225 051
Total length	5 175 001	5 163 999	5 162 498	5 161 405
N50	2 932 503	2 925 498	2 925 998	2 131 500
Misassemblies
# misassemblies	0	1	0	0
Misassembled contigs length	0	2 238 501	0	0
Mismatches
# mismatches per 100kbp	0.02	0.06	9.98	6.42
# indels per 100kbp	0.77	0.52	0.85	1.44
# N's per 100kbp	0	0	0	0
Genome Statistics
Genome fraction(%)	100	99.931	99.869	99.782
Duplication ratio	1.001	1.001	1	1.001
# genes	4338+1 part	4336+2 part	4331+4 part	4328+7 part
NGA50	2 932 503	2 925 498	2 925 998	2 131 500

Misassemblies for Adobe reader.

Dataset 9 (E. coli K-12, P4-C2 chemistry, 20 Kbp, 1 SMRT cell)

We used all SMRT cells and evaluated the assemblies by QUAST against the reference genome (NC_000913) and Ec_gene_list.

Assembly

We used the one SMRT cell and access the correctness by Quast

Statistics without reference	All Data
# contigs	2
Largest contig	4 656 681
Total length	4 672 546
N50	4 656 681
Misassemblies
# misassemblies	9
Misassembled contigs length	4 672 546
Mismatches
# mismatches per 100kbp	0.15
# indels per 100kbp	4.87
# N's per 100kbp	0
Genome Statistics
Genome fraction(%)	100
Duplication ratio	1.007
# genes	4494+3 part
NGA50	2 995 500
Running Time	16hr 40m

Postprocess by discarding unconvincing contigs

We aligned subreads to contigs, and discarded the contigs with fewer than 100 reads aligned.

Statistics without reference	All Data
# contigs	1
Largest contig	4 656 681
Total length	4 656 681
N50	4 656 681
Misassemblies
# misassemblies	8
Misassembled contigs length	4 656 681
Mismatches
# mismatches per 100kbp	0.15
# indels per 100kbp	4.87
# N's per 100kbp	0
Genome Statistics
Genome fraction(%)	100
Duplication ratio	1.004
# genes	4494+3 part
NGA50	2 995 500

Postprocess by discarding lower-case bases

After discarding unconvincing contigs, we discarded low quality bases which present in lower-case from contigs two-side ends. more detail

Statistics without reference	All Data
# contigs	1
Largest contig	4 654 377
Total length	4 654 377
N50	4 654 377
Misassemblies
# misassemblies	8
Misassembled contigs length	4 654 377
Mismatches
# mismatches per 100kbp	0.15
# indels per 100kbp	4.81
# N's per 100kbp	0
Genome Statistics
Genome fraction(%)	100
Duplication ratio	1.003
# genes	4494+3 part
NGA50	3 026 319

HGAP 3.0 with Dataset 9

We used HGAP3.0.xml protocol and ran dataset 9 on SMRT portal.

Assembly

with different genomeSize

Statistics without reference	genomeSize=4650000	genomeSize=4185000	genomeSize=5115000	genomeSize=3720000	genomeSize=5580000
# contigs	1	1	1	1	1
Largest contig	4657584	4657584	4657492	4657578	4657479
Total length	4657584	4657584	4657492	4657578	4657479
N50	4657584	4657584	4657492	4657578	4657479
Misassemblies
# misassemblies	8	8	8	8	8
Misassembled contigs length	4657584	4657584	4657492	4657578	4657479
Mismatches
# mismatches per 100kbp	0.15	0.15	0.15	0.15	0.15
# indels per 100kbp	0.19	0.19	0.17	0.19	0.17
# N's per 100kbp	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	100	100	100
Duplication ratio	1.004	1.004	1.004	1.004	1.004
# genes	4494 + 3 part	4494 + 3 part	4494 + 3 part	4494 + 3 part	4494 +3 part
NGA50	3026417	3026417	3026417	3026417	3026417
Running Time	2hr 21m	2hr 8m	2hr 13m	2hr 2m	2hr 22m

Postprocess by discarding lower-case bases

We discarded low quality bases which present in lower-case from contigs two-side ends. more detail

Statistics without reference	genomeSize=4650000	genomeSize=4185000	genomeSize=5115000	genomeSize=3720000	genomeSize=5580000
# contigs	1	1	1	1	1
Largest contig	4656344	4656344	4656242	4656345	4656234
Total length	4656344	4656344	4656242	4656345	4656234
N50	4656344	4656344	4656242	4656345	4656234
Misassemblies
# misassemblies	8	8	8	8	8
Misassembled contigs length	4656344	4656344	4656242	4656345	4656234
Mismatches
# mismatches per 100kbp	0.15	0.15	0.15	0.15	0.15
# indels per 100kbp	0.19	0.19	0.17	0.19	0.17
# N's per 100kbp	0	0	0	0	0
Genome Statistics
Genome fraction(%)	100	100	100	100	100
Duplication ratio	1.004	1.004	1.004	1.004	1.004
# genes	4494 + 3 part	4494 + 3 part	4494 + 3 part	4494 + 3 part	4494 +3 part
NGA50	3026417	3026417	3026417	3026417	3026417

without genomesize

Statistics without reference	All Data
# contigs	1
Largest contig	4657553
Total length	4657553
N50	4657553
Misassemblies
# misassemblies	8
Misassembled contigs length	4657553
Mismatches
# mismatches per 100kbp	0.15
# indels per 100kbp	0.19
# N's per 100kbp	0
Genome Statistics
Genome fraction(%)	100
Duplication ratio	1.004
# genes	4494+3 part
NGA50	3026417

Postprocess by discarding lower-case bases

Statistics without reference	All Data
# contigs	1
Largest contig	4656299
Total length	4656299
N50	4656299
Misassemblies
# misassemblies	8
Misassembled contigs length	4656344
Mismatches
# mismatches per 100kbp	0.15
# indels per 100kbp	0.19
# N's per 100kbp	0
Genome Statistics
Genome fraction(%)	100
Duplication ratio	1.004
# genes	4494+3 part
NGA50	3026417