Line 1: |
|
|
+ |
= Prerequisites =
|
|
|
+ |
* Linux 64-bit environment
|
|
|
+ |
[http://www.centos.org CentOS]
|
|
|
+ |
* Python 2.X
|
|
|
+ |
[http://www.python.org Python]<br>
|
|
|
+ |
- Under centos: yum install python
|
|
|
+ |
* MUMmer 3.22 or higher
|
|
|
+ |
[http://mummer.sourceforge.net MUMmer]
|
|
|
+ |
* Blast 2.2.25+ or higher
|
|
|
+ |
[http://blast.ncbi.nlm.nih.gov Blast+]
|
|
|
|
|
|
|
|
|
|
|
+ |
= Installation =
|
|
|
+ |
Please unpack the tar file.<br>
|
|
|
+ |
[[Media:CISA_20120425.tar|CISA - 2012.04.25]]
|
|
|
|
|
|
|
+ |
Available commands:
|
|
|
+ |
>tar xvf CISA.tar
|
|
|
|
|
|
|
+ |
Execute access:
|
|
|
+ |
>chmod 755 -R CICA1.0
|
|
|
|
|
|
|
+ |
== Commands and configuration ==
|
|
|
|
|
|
|
+ |
=== Merge.py ===
|
|
|
|
|
|
|
+ |
Mergy.py can convert data format to fit CISA. '''This is an essentail pre-work'''.
|
|
|
|
|
|
|
+ |
Available commands:
|
|
|
+ |
>python Merge.py Merge.config
|
|
|
|
|
|
|
+ |
The content of configuration file:
|
|
|
|
|
|
|
+ |
count=3
|
|
|
+ |
data=assembly1.fasta,title=Contig_m1
|
|
|
+ |
data=assembly2.fasta,title=Contig_m2
|
|
|
+ |
data=assembly3.fasta,title=Contig_m3
|
|
|
+ |
Master_file=Contigs_m.fa
|
|
|
+ |
min_length=100 (default:100)
|
|
|
+ |
Gap=11
|
|
|
|
|
|
|
+ |
#count<br>The number of assemblies you would like to merge
|
|
|
|
|
|
|
+ |
#min_length<br>Contigs of length smaller than min_length (e.g. 100 bp) will be discarded.
|
|
|
|
|
|
|
+ |
#Gap<br>It's an optional variable.<br>With this variable, we will split the assemblies into contigs at >10 Ns.
|
|
|
|
|
|
|
+ |
Please note that '''dos2unix''' may be necessary if your data are in DOS/MAC format.
|
|
|
+ |
>dos2unix assembly1.fasta assembly2.fasta assembly3.fasta
|
|
|
+ |
=== CISA.py ===
|
|
|
|
|
|
|
+ |
Available commands:
|
|
|
+ |
>python CISA.py CISA.config
|
|
|
|
|
|
|
+ |
The content of configuration file:
|
|
|
+ |
genome=genome size
|
|
|
+ |
infile=file
|
|
|
+ |
outfile=file
|
|
|
+ |
nucmer=your installed path/nucmer
|
|
|
+ |
R2_Gap=0.95 (default:0.95)
|
|
|
+ |
CISA=your installed path/CISA1.0
|
|
|
+ |
makeblastdb=your installed path/makeblastdb
|
|
|
+ |
blastn=your installed path/blastn
|
|
|
|
|
|
|
+ |
#genome<br/>Please input the estimated genome size here. The longest length of your input assemblies will be recommended. <br/>The break point of CISA will be set to 1.1 * genome variable.
|
|
|
+ |
#infile<br/>The file containing the set of contigs you want to integrate.
|
|
|
+ |
#nucmer<br/>The executive file for nucmer. If nucmer has been added into the path, this variable can be skipped.
|
|
|
+ |
#makeblastdb<br/>The executive file for makeblastdb. If makeblastdb has been added into the path, this variable can be skipped.
|
|
|
+ |
#blastn<br/>The executive file for blastn. If blastn has been added into the path, this variable can be skipped.
|
|
|
+ |
#CISA<br/>Home directory of CISA.
|
|
|
+ |
#R2_Gap<br/>A threshold used in the phase 2 of CISA.
|
|
|
|
|
|
|
+ |
== Example ==
|
|
|
|
|
|
|
+ |
<ul><li><strong>Prepair datasets:</strong> At least three assemblies are required for contig integration using CISA.</li></ul>
|
|
|
|
|
|
|
+ |
<blockquote>In the case of [[Ecoli|Ecoli]], we used five softwares including Abyss, CLC, Edena, SOAPdenovo and Velvet to generate five assemblies.</blockquote>
|
|
|
|
|
|
|
+ |
<ul><li><strong>Convert the format and ID of your datasets:</strong><br/></li></ul>
|
|
|
+ |
<blockquote><p> The content of the configuration file (Merge.config):</p></blockquote>
|
|
|
+ |
<table width="457" border="1">
|
|
|
+ |
<tr>
|
|
|
+ |
<td height="152" align="left" bgcolor="#66CCFF">
|
|
|
+ |
count=5
|
|
|
+ |
data=Abyss_contigs.fa,title=Abyss
|
|
|
+ |
data=Edena_contigs.fa,title=Edena
|
|
|
+ |
data=SOAPdenovo_contigs.fa,title=SOAP
|
|
|
+ |
data=CLC_contigs.fa,title=CLC
|
|
|
+ |
data=Velvet_contigs.fa,title=Velvet
|
|
|
+ |
Master_file=Contigs.fa
|
|
|
+ |
</td>
|
|
|
+ |
</tr>
|
|
|
+ |
</table>
|
|
|
|
|
|
|
+ |
<ul><li><strong>Command:</strong><br /></li></ul><blockquote><p>>python Merge.py Merge.config</p></blockquote>
|
|
|
|
|
|
|
+ |
Input files: [[Media:Abyss contigs.fa|Abyss_contigs.fa]], [[Media:Edena contigs.fa|Edena_contigs.fa]], [[Media:SOAPdenovo contigs.fa|SOAPdenovo_contigs.fa]], [[Media:CLC contigs.fa|CLC_contigs.fa]], [[Media:Velvetcontigs.fa|Velvet_contigs.fa]]
|
|
|
|
|
|
|
+ |
Statistics of the input assemblies:
|
|
|
+ |
<blockquote>
|
|
|
+ |
Abyss_contigs.fa
|
|
|
+ |
<br>Number of contigs: 133
|
|
|
+ |
<br>Length of the longest contig: 222425
|
|
|
+ |
<br>whole:4626205
|
|
|
+ |
<br>N50: 96511
|
|
|
+ |
<br>Edena_contigs.fa
|
|
|
+ |
<br>Number of contigs: 211
|
|
|
+ |
<br>Length of the longest contig: 186686
|
|
|
+ |
<br>whole:4569446
|
|
|
+ |
<br>N50: 57790
|
|
|
+ |
<br>SOAPdenovo_contigs.fa
|
|
|
+ |
<br>Number of contigs: 553
|
|
|
+ |
<br>Length of the longest contig: 103369
|
|
|
+ |
<br>whole:4547211
|
|
|
+ |
<br>N50: 17944
|
|
|
+ |
<br>CLC_contigs.fa
|
|
|
+ |
<br>Number of contigs: 378
|
|
|
+ |
<br>Length of the longest contig: 107342
|
|
|
+ |
<br>whole:4546827
|
|
|
+ |
<br>N50: 29905
|
|
|
+ |
<br>Velvet_contigs.fa
|
|
|
+ |
<br>Number of contigs: 283
|
|
|
+ |
<br>Length of the longest contig: 166094
|
|
|
+ |
<br>whole:4550675
|
|
|
+ |
<br>N50: 54359
|
|
|
+ |
</blockquote>
|
|
|
+ |
Output file: [[Media:Set1 Contigs.fa|Contigs.fa]]
|
|
|
|
|
|
|
+ |
<ul><li><strong>Start to integrate contigs:</strong><br /></li></ul><blockquote><p>The content of the configuration file (CISA.config):</p></blockquote>
|
|
|
|
|
|
|
+ |
<table width="457" border="1">
|
|
|
+ |
<tr>
|
|
|
+ |
<td height="152" align="left" bgcolor="#66CCFF">
|
|
|
+ |
genome=4626205
|
|
|
+ |
infile=Contigs.fa
|
|
|
+ |
outfile=CISA.fa
|
|
|
+ |
nucmer=path/nucmer
|
|
|
+ |
R2_Gap=0.95
|
|
|
+ |
CISA=path/CISA1.0
|
|
|
+ |
makeblastdb=path/makeblastdb
|
|
|
+ |
blastn=path/blastn
|
|
|
+ |
</td>
|
|
|
+ |
</tr>
|
|
|
+ |
</table>
|
|
|
|
|
|
|
+ |
<blockquote><p>The genome is set to 4626205 based on the genome size of Abyss.<br /></p></blockquote>
|
|
|
+ |
<ul><li><strong>Command:</strong><br /></li></ul><blockquote><p>>python CISA.py CISA.config</p></blockquote>
|
|
|
|
|
|
|
+ |
Input file: [[Media:Set1 Contigs.fa|Contigs.fa]]
|
|
|
|
|
|
|
+ |
Output file: [[Media:CISA_Set1_Contigs.fa|CISA.fa]]
|