Description

Revision as of 05 May 2012 01:58 by admin (Comments | Contribs)

CISA consists of four phases for contig integration.

Four folders are generated after running CISA.


In the phase 1 (CISA1):

  • The representative contigs and their explained contigs are recorded in the file of Explained.txt.
  • Information about contig extensions are recorded in the file of Extend_info.
For example,
Head: >Skeleton contig >Extended contig
Tail: >Skeleton contig >Extended contig
  • The processed contigs named R1_contigs.fa is placed outside the CISA1 folder.

In the phase 2 (CISA2):

  • The uncertain regions located in the end of contigs were clipped (clip_info)
  • The clipped out sequences are recorded in the file of clip_out
  • The unalignable gaps are recorded in the file of Gaps, and the size of gaps larger than 95th quantile (R2_gap=0.95) are clipped.
  • The misassembled contigs recorded in the file of Remove_Info are removed and extra contigs are introduced if necessary.
For example, in the case of E. coli:
          1    14641  |    36367    21727  |    14641    14641  |    99.99  | CLC_100_len:33932	Abyss_133_len:170886
      14608    33932  |    58206    38882  |    19325    19325  |   100.00  | CLC_100_len:33932	Abyss_129_len:72302
          1    14641  |    31800    17160  |    14641    14641  |    99.99  | CLC_100_len:33932	Edena_65_len:79603
      14608    33932  |    58206    38882  |    19325    19325  |   100.00  | CLC_100_len:33932	Edena_51_len:126254
          1    14641  |    31804    17164  |    14641    14641  |    99.99  | CLC_100_len:33932	Velvet_45_len:70264
      14608    33932  |    13945    33269  |    19325    19325  |   100.00  | CLC_100_len:33932	Velvet_42_len:58831
The representative contig CLC_100 is misassembled. We removed this contig and introduced an extra representative contig - Abyss_129 in CISA2.
  • The processed contigs named R2_contigs.fa is placed outside the CISA2 folder.

In the phase 3(CISA3):

  • Several rounds of blastn are performed in order to merge the contigs iteratively and identify repetitive regions.
  • In each round, information about contig extensions and the repetitive regions are recorded in the files of Extend_info and Repeat_Region.txt, respectively.