Description

CISA consists of four phases for contig integration.

Four folders are generated after running CISA.

In the phase 1 (CISA1):

  • The representative contigs and their explained contigs are recorded in the file of Explained.txt.
  • Information about contig extensions are recorded in the file of Extend_info.
For example,
Head: >Skeleton contig >Extended contig
Tail: >Skeleton contig >Extended contig
  • The processed contigs named R1_contigs.fa is placed outside the CISA1 folder.

In the phase 2 (CISA2):

  • The uncertain regions located in the end of contigs were clipped (clip_info).
  • The clipped out sequences are recorded in the file of clip_out.
  • The unalignable gaps are recorded in the file of Gaps, after sorting in ascending order, contigs with the size of gaps larger than 95th quantile (R2_gap=0.95) are clipped.
  • The misassembled contigs recorded in the file of Remove_Info are removed and extra contigs are introduced if necessary.
For example, in the case of E. coli:
          1    14641  |    36367    21727  |    14641    14641  |    99.99  | CLC_100_len:33932	Abyss_133_len:170886
      14608    33932  |    58206    38882  |    19325    19325  |   100.00  | CLC_100_len:33932	Abyss_129_len:72302
          1    14641  |    31800    17160  |    14641    14641  |    99.99  | CLC_100_len:33932	Edena_65_len:79603
      14608    33932  |    58206    38882  |    19325    19325  |   100.00  | CLC_100_len:33932	Edena_51_len:126254
          1    14641  |    31804    17164  |    14641    14641  |    99.99  | CLC_100_len:33932	Velvet_45_len:70264
      14608    33932  |    13945    33269  |    19325    19325  |   100.00  | CLC_100_len:33932	Velvet_42_len:58831
The representative contig CLC_100 is misassembled. We removed this contig and introduced an extra representative contig - Abyss_129 in CISA2.
  • The processed contigs named R2_contigs.fa is placed outside the CISA2 folder.

In the phase 3 (CISA3):

  • Several rounds of blastn are performed in order to merge the contigs iteratively and identify repetitive regions.
  • In each round, information about contig extensions and the repetitive regions are recorded in the files of Extend_info and Repeat_Region.txt, respectively.
  • The processed contigs named temp.fa is placed in each folder of round.
  • The size of repetitive regions is estimated and recorded in the file of info1 which is placed outside the CISA3 folder.

In the phase 4 (CISA4):

  • Similar to CISA3, several rounds of blastn are performed in order to merge contigs with overlap larger than the maximum size of the repetitive regions.
  • In each round, information about contig extensions and the processed contigs are recorded in the files of Extend_info and temp.fa, respectively.
  • The finally processed contigs named CISA.fa (as you defined in the CISA.config, outfile=CISA.fa) is placed outside the CISA4 folder.