Instruction

Revision as of 14 September 2012 03:15 by admin (Comments | Contribs)

Prerequisites

  • Linux 64-bit environment

CentOS

  • Python 2.X

Python
- Under centos: yum install python

  • MUMmer 3.22 or higher

MUMmer

  • Blast 2.2.25+ or higher

Blast+


Installation

Please unpack the tar file.
CISA - 2012.09.07
CISA - 2012.04.25

Available commands:

   >tar xvf CISA.tar

Execute access:

   >chmod 755 -R CICA1.0

Commands and configuration

Merge.py

Mergy.py can convert data format to fit CISA. This is an essentail pre-work.

Available commands:

   >python Merge.py Merge.config

The content of configuration file:

   count=3 
   data=assembly1.fasta,title=Contig_m1  
   data=assembly2.fasta,title=Contig_m2
   data=assembly3.fasta,title=Contig_m3
   Master_file=Contigs_m.fa
   min_length=100 (default:100)
   Gap=11
  1. count
    The number of assemblies you would like to merge
  2. min_length
    Contigs of length smaller than min_length (e.g. 100 bp) will be discarded.
  3. Gap
    It's an optional variable.
    With this variable, we will split the assemblies into contigs at >10 Ns.

Please note that dos2unix may be necessary if your data are in DOS/MAC format.

   >dos2unix assembly1.fasta assembly2.fasta assembly3.fasta

CISA.py

Available commands:

    >python CISA.py CISA.config

The content of configuration file:

   genome=genome size
   infile=file
   outfile=file
   nucmer=your installed path/nucmer
   R2_Gap=0.95 (default:0.95)
   CISA=your installed path/CISA1.0
   makeblastdb=your installed path/makeblastdb
   blastn=your installed path/blastn
  1. genome
    Please input the estimated genome size here. The longest length of your input assemblies will be recommended.
    The break point of CISA will be set to 1.1 * genome variable.
  2. infile
    The file containing the set of contigs you want to integrate.
  3. nucmer
    The executive file for nucmer. If nucmer has been added into the path, this variable can be skipped.
  4. makeblastdb
    The executive file for makeblastdb. If makeblastdb has been added into the path, this variable can be skipped.
  5. blastn
    The executive file for blastn. If blastn has been added into the path, this variable can be skipped.
  6. CISA
    Home directory of CISA.
  7. R2_Gap
    A threshold used in the phase 2 of CISA.

Example

  • Prepair datasets: At least three assemblies are required for contig integration using CISA.
In the case of Ecoli, we used five softwares including Abyss, CLC, Edena, SOAPdenovo and Velvet to generate five assemblies.
  • Convert the format and ID of your datasets:

The content of the configuration file (Merge.config):

   count=5
   data=Abyss_contigs.fa,title=Abyss
   data=Edena_contigs.fa,title=Edena
   data=SOAPdenovo_contigs.fa,title=SOAP
   data=CLC_contigs.fa,title=CLC
   data=Velvet_contigs.fa,title=Velvet
   Master_file=Contigs.fa
  • Command:

>python Merge.py Merge.config

Input files: Abyss_contigs.fa, Edena_contigs.fa, SOAPdenovo_contigs.fa, CLC_contigs.fa, Velvet_contigs.fa

Statistics of the input assemblies:

Abyss_contigs.fa
Number of contigs: 133
Length of the longest contig: 222425
whole:4626205
N50: 96511
Edena_contigs.fa
Number of contigs: 211
Length of the longest contig: 186686
whole:4569446
N50: 57790
SOAPdenovo_contigs.fa
Number of contigs: 553
Length of the longest contig: 103369
whole:4547211
N50: 17944
CLC_contigs.fa
Number of contigs: 378
Length of the longest contig: 107342
whole:4546827
N50: 29905
Velvet_contigs.fa
Number of contigs: 283
Length of the longest contig: 166094
whole:4550675
N50: 54359

Output file: Contigs.fa

  • Start to integrate contigs:

The content of the configuration file (CISA.config):

   genome=4626205 
   infile=Contigs.fa
   outfile=CISA.fa
   nucmer=path/nucmer
   R2_Gap=0.95
   CISA=path/CISA1.0
   makeblastdb=path/makeblastdb
   blastn=path/blastn

The genome is set to 4626205 based on the genome size of Abyss.

  • Command:

>python CISA.py CISA.config

Input file: Contigs.fa

Output file: CISA.fa