Guide
Examples
= Prerequisites = * Linux 64-bit environment [http://www.centos.org CentOS] * Python 2.X [http://www.python.org Python]<br> - Under centos: yum install python * MUMmer 3.22 or higher [http://mummer.sourceforge.net MUMmer] * Blast 2.2.25+ or higher [http://blast.ncbi.nlm.nih.gov Blast+] = Installation = Please unpack the tar file.<br> [[Media:CISA_20120425.tar|CISA - 2012.04.25]] Available commands: >tar xvf CISA.tar Execute access: >chmod 755 -R CICA1.0 == Commands and configuration == === Merge.py === Mergy.py can convert data format to fit CISA. '''This is an essentail pre-work'''. Available commands: >python Merge.py Merge.config The content of configuration file: count=3 data=assembly1.fasta,title=Contig_m1 data=assembly2.fasta,title=Contig_m2 data=assembly3.fasta,title=Contig_m3 Master_file=Contigs_m.fa min_length=100 (default:100) Gap=11 #count<br>The number of assemblies you would like to merge #min_length<br>Contigs of length smaller than min_length (e.g. 100 bp) will be discarded. #Gap<br>It's an optional variable.<br>With this variable, we will split the assemblies into contigs at >10 Ns. Please note that '''dos2unix''' may be necessary if your data are in DOS/MAC format. >dos2unix assembly1.fasta assembly2.fasta assembly3.fasta === CISA.py === Available commands: >python CISA.py CISA.config The content of configuration file: genome=genome size infile=file outfile=file nucmer=your installed path/nucmer R2_Gap=0.95 (default:0.95) CISA=your installed path/CISA1.0 makeblastdb=your installed path/makeblastdb blastn=your installed path/blastn #genome<br/>Please input the estimated genome size here. The longest length of your input assemblies will be recommended. <br/>The break point of CISA will be set to 1.1 * genome variable. #infile<br/>The file containing the set of contigs you want to integrate. #nucmer<br/>The executive file for nucmer. If nucmer has been added into the path, this variable can be skipped. #makeblastdb<br/>The executive file for makeblastdb. If makeblastdb has been added into the path, this variable can be skipped. #blastn<br/>The executive file for blastn. If blastn has been added into the path, this variable can be skipped. #CISA<br/>Home directory of CISA. #R2_Gap<br/>A threshold used in the phase 2 of CISA. == Example == <ul><li><strong>Prepair datasets:</strong> At least three assemblies are required for contig integration using CISA.</li></ul> <blockquote>In the case of [[Ecoli|Ecoli]], we used five softwares including Abyss, CLC, Edena, SOAPdenovo and Velvet to generate five assemblies.</blockquote> <ul><li><strong>Convert the format and ID of your datasets:</strong><br/></li></ul> <blockquote><p> The content of the configuration file (Merge.config):</p></blockquote> <table width="457" border="1"> <tr> <td height="152" align="left" bgcolor="#66CCFF"> count=5 data=Abyss_contigs.fa,title=Abyss data=Edena_contigs.fa,title=Edena data=SOAPdenovo_contigs.fa,title=SOAP data=CLC_contigs.fa,title=CLC data=Velvet_contigs.fa,title=Velvet Master_file=Contigs.fa </td> </tr> </table> <ul><li><strong>Command:</strong><br /></li></ul><blockquote><p>>python Merge.py Merge.config</p></blockquote> Input files: [[Media:Abyss contigs.fa|Abyss_contigs.fa]], [[Media:Edena contigs.fa|Edena_contigs.fa]], [[Media:SOAPdenovo contigs.fa|SOAPdenovo_contigs.fa]], [[Media:CLC contigs.fa|CLC_contigs.fa]], [[Media:Velvetcontigs.fa|Velvet_contigs.fa]] Statistics of the input assemblies: <blockquote>Abyss_contigs.fa <br>Number of contigs: 133 <br>Length of the longest contig: 222425 <br>whole:4626205 <br>N50: 96511 <br>Edena_contigs.fa <br>Number of contigs: 211 <br>Length of the longest contig: 186686 <br>whole:4569446 <br>N50: 57790 <br>SOAPdenovo_contigs.fa <br>Number of contigs: 553 <br>Length of the longest contig: 103369 <br>whole:4547211 <br>N50: 17944 <br>CLC_contigs.fa <br>Number of contigs: 378 <br>Length of the longest contig: 107342 <br>whole:4546827 <br>N50: 29905 <br>Velvet_contigs.fa <br>Number of contigs: 283 <br>Length of the longest contig: 166094 <br>whole:4550675 <br>N50: 54359 </blockquote> Output file: [[Media:Set1 Contigs.fa|Contigs.fa]] <ul><li><strong>Start to integrate contigs:</strong><br /></li></ul><blockquote><p>The content of the configuration file (CISA.config):</p></blockquote> <table width="457" border="1"> <tr> <td height="152" align="left" bgcolor="#66CCFF"> genome=4626205 infile=Contigs.fa nucmer=path/nucmer R2_Gap=0.95 CISA=path/CISA1.0 makeblastdb=path/makeblastdb blastn=path/blastn </td> </tr> </table> <blockquote><p>4626205 which is the longest whole genome between different result from 5 assemblers is set into genome variable.<br /></p></blockquote> <ul><li><strong>Command:</strong><br /></li></ul><blockquote><p>>python CISA.py CISA.config</p></blockquote>