CISA consists of four phases for contig integration.
Four folders are generated after running CISA.
In the phase 1 (CISA1):
- The representative contigs and their explained contigs are recorded in the file of Explained.txt.
- Information about contig extensions are recorded in the file of Extend_info.
- For example,
- Head: >Skeleton contig >Extended contig
- Tail: >Skeleton contig >Extended contig
- The processed contigs named R1_contigs.fa is placed outside the CISA1 folder.
In the phase 2 (CISA2):
- The uncertain regions located in the end of contigs were clipped (clip_info).
- The clipped out sequences are recorded in the file of clip_out.
- The unalignable gaps are recorded in the file of Gaps, after sorting in ascending order, contigs with the size of gaps larger than 95th quantile (R2_gap=0.95) are clipped.
- The misassembled contigs recorded in the file of Remove_Info are removed and extra contigs are introduced if necessary.
- For example, in the case of E. coli:
1 14641 | 36367 21727 | 14641 14641 | 99.99 | CLC_100_len:33932 Abyss_133_len:170886
14608 33932 | 58206 38882 | 19325 19325 | 100.00 | CLC_100_len:33932 Abyss_129_len:72302
1 14641 | 31800 17160 | 14641 14641 | 99.99 | CLC_100_len:33932 Edena_65_len:79603
14608 33932 | 58206 38882 | 19325 19325 | 100.00 | CLC_100_len:33932 Edena_51_len:126254
1 14641 | 31804 17164 | 14641 14641 | 99.99 | CLC_100_len:33932 Velvet_45_len:70264
14608 33932 | 13945 33269 | 19325 19325 | 100.00 | CLC_100_len:33932 Velvet_42_len:58831
- The representative contig CLC_100 is misassembled. We removed this contig and introduced an extra representative contig - Abyss_129 in CISA2.
- The processed contigs named R2_contigs.fa is placed outside the CISA2 folder.
In the phase 3 (CISA3):
- Several rounds of blastn are performed in order to merge the contigs iteratively and identify repetitive regions.
- In each round, information about contig extensions and the repetitive regions are recorded in the files of Extend_info and Repeat_Region.txt, respectively.
- The processed contigs named temp.fa is placed in each folder of round.
- The size of repetitive regions is estimated and recorded in the file of info1 which is placed outside the CISA3 folder.
In the phase 4 (CISA4):
- Similar to CISA3, several rounds of blastn are performed in order to merge contigs with overlap larger than the maximum size of the repetitive regions.
- In each round, information about contig extensions and the processed contigs are recorded in the files of Extend_info and temp.fa, respectively.
- The finally processed contigs named CISA.fa (as you defined in the CISA.config, outfile=CISA.fa) is placed outside the CISA4 folder.