Description

Revision as of 05 May 2012 01:48 by admin (Comments | Contribs)

CISA consists of four phases for contig integration.

Four folders are generated after running CISA.

In the phase 1 (CISA1):

  • The representative contigs and their explained contigs are recorded in the file of Explained.txt.
  • Information about contig extensions are recorded in the file of Extend_info.
For example,
Head: >Skeleton contig >Extended contig
Tail: >Skeleton contig >Extended contig
  • The processed contigs named R1_contigs.fa is placed outside the CISA1 folder.

In the phase 2 (CISA2):

  • The uncertain regions located in the end of contigs were clipped (clip_info)
  • The clipped out sequences are recorded in the file of clip_out
  • The unalignable gaps are recorded in the file of Gaps, and the size of gaps larger than 95th quantile (R2_gap=0.95) are clipped.
  • The misassembled contigs recorded in the file of Remove_Info are removed and extra contigs are introduced if necessary.
For example, in the case of E. coli:
1 14641 | 36367 21727 | 14641 14641 | 99.99 | CLC_100_len:33932 Abyss_133_len:170886
14608 33932 | 58206 38882 | 19325 19325 | 100.00 | CLC_100_len:33932 Abyss_129_len:72302
1 14641 | 31800 17160 | 14641 14641 | 99.99 | CLC_100_len:33932 Edena_65_len:79603
14608 33932 | 58206 38882 | 19325 19325 | 100.00 | CLC_100_len:33932 Edena_51_len:126254
1 14641 | 31804 17164 | 14641 14641 | 99.99 | CLC_100_len:33932 Velvet_45_len:70264
14608 33932 | 13945 33269 | 19325 19325 | 100.00 | CLC_100_len:33932 Velvet_42_len:58831
The representative contig CLC_100 is misassembled. We removed this contig and introduce an extra representative contig - Abyss_129.