Manual2

Prerequisites

  • Linux 64-bit environment

CentOS

  • Python 2.4.3 or higher

Python
- Under centos: yum install python

  • MUMmer 3.22 or higher

MUMmer

  • Blast 2.2.25+ or higher

Blast+


Installation

Just to unpack the tar file.

Available commands:

   tar xvf CISA.tar
   chmod 755 -R CICA1.0

Commands and configuration

Merge.py

Mergy.pu can convert data format to fit CISA. This is a essentail pre-work.

Available commands:

    ./Merge.py myconfig

The content of configuration file:

   count=3  the number of dataset you would like to merge 
   data=assembly1.fasta,title=Contig_m1  
   data=assembly2.fasta,title=Contig_m2
   data=assembly3.fasta,title=Contig_m3
   Master_file=Contigs_m.fa
   min_length=100 (default:100)
   Gap=11  此行沒放在config example

The min_length means that contig which is longer than 100 will be conserved. The Gap is a optional variable.
if Gap attends, it will be used to split scoffolding by continuous 11 N.
if Gap is absent, the program will only merge data.
將Mergy.py print 出的資訊(genome size) 同時輸出在螢幕上以及存成檔案(genome_size.txt),以便使用者日後查看

CISA.py

Available commands:

    ./CISA.py myconfig

The content of myconfig file

   genome=genome size   The genome size of each genome contain in genome_size.txt file. The largest genome size will be recommended to put here.
   infile=Contigs.fa
   nucmer=path/nucmer   Please modify the "path" into the 安裝路徑.
   R2_Gap=0.9 default:0.9 myconfig file中也改成0.9
   CISA=path/CISA1.0   Please modify the "path" into the 安裝路徑.
   makeblastdb=path/makeblastdb    Please modify the "path" into the 安裝路徑.
   blastn=path/blastn   Please modify the "path" into the 安裝路徑.
  1. genome
    We suggest to use the longest length which is between attended contigs as genome variable.
    The break point of CISA will be set to 1.1 * genome variable.
  2. infile
    File name with input.
  3. nucmer
    Executive file for nucmer. If nucmer has beed set into the path, nucmer variable can be skipped.
  4. makeblastdb
    Executive file for makeblastdb. If makeblastdb has beed set into the path, makeblastdb variable can be skipped.
  5. blastn
    Executive file for blastn. If blastn has beed set into the path, blastn variable can be skipped.
  6. CISA
    Home directory of CISA.
  7. R2_Gap
    Tolerant amount of gap during CISA2 step.

Example

Data Set

Ecoli

Merge Contigs

The content of the configuration file:

 count=5                              
 data=Abyss_contigs.fa,title=Abyss   設定hyperlink到example file,方便直接下載    
 data=Edena_contigs.fa,title=Edena
 data=SOAPdenovo_contigs.fa,title=SOAP
 data=CLC_contigs.fa,title=CLC
 data=Velvet_contigs.fa,title=Velvet
 Master_file=Contigs.fa
  • Command

./Merge.py myconfig


Start to integrate

The content of the configuration file:

 genome=4626205
 infile=Contigs.fa
 nucmer=path/nucmer
 R2_Gap=0.9
 CISA=path/CISA1.0
 makeblastdb=path/makeblastdb
 blastn=path/blastn
  • 4626205 which is the longest whole genome between different result from 5 assemblers is set into genome variable.
  • Command:

python CISA.py CISA.config

Test-mammer

  • Prepair datasets: the contigs which generate from different softwares you would like to integrate

Ecoli

  • Convert the format and ID of your datasets:

The content of the configuration file (Merge.config):

count=5                                
data=Abyss_contigs.fa,title=Abyss         
data=Edena_contigs.fa,title=Edena  
data=SOAPdenovo_contigs.fa,title=SOAP  
data=CLC_contigs.fa,title=CLC  
data=Velvet_contigs.fa,title=Velvet  
Master_file=Contigs.fa
  • Command:

>python Merge.py Merge.config

  • Start to integrate contigs:

The content of the configuration file (CISA.config):

genome=4626205  
infile=Contigs.fa 
nucmer=path/nucmer  
R2_Gap=0.95  
CISA=path/CISA1.0 
makeblastdb=path/makeblastdb  
blastn=path/blastn

4626205 which is the longest whole genome between different result from 5 assemblers is set into genome variable.

  • Command:

>python CISA.py CISA.config