OAF - Ornitine decarboxylase Antizyme Finder

Download

You can download and install locally all versions of OAF. The latter is the better!

\n version $match[1] (".date("F jS, Y",filemtime('release/'.$entry)).")\n

Installation Howto

Download, then unpack the tar file. For example:
>bunzip2 oaf-x.x.tar.bz2 >tar xvf oaf-x.x.tar >cd OAF-x.x
Now issue the make commands:
>perl Makefile.PL >make >make install
To 'make install' you need write permission in the perl5/site_perl/source area. Usually this will require you becoming root, so you will want to talk to your system administrator if you don't have the necessary privileges.

System Requirements

perl 5.005 or later.
Bioperl modules: OAF uses functionality provided in Bioperl modules. (see http://www.bioperl.org/)
HMMER is mandatory since OAF is base on HMM profiles. (see http://hmmer.janelia.org/)
FASTA is mandatory if you plan to use sequence longer than 20 kb. (see ftp://ftp.ebi.ac.uk/pub/software/unix/fasta/)
BLAST is mandatory if you plan to search through a database, except if you use the NCBI remote BLAST server. (see http://www.ncbi.nlm.nih.gov/BLAST/download.shtml)

After installing HMMER, FASTA and BLAST packages on your system, you may edit the environmental variable $HMMERDIR, $FASTADIR and $BLASTDIR or your $PATH variable to point to the HMMER, FASTA and BLAST directories repectively. Ensure that users have execute privileges for those programs.

Quick start

For newcomers and people who want to quickly evaluate whether this package is worth using in the first place, we have a very simple application which allows easy access to OAF's functionality in an easy to use manner. The Bio::Tools::OAZ module provides all the functions. For example, this example will retrieve a OAZ gene from a single sequence file and write it out in GenBank format.

Example of FASTA format:

>my_sequence
TGAACCTCTGCGACTTATCCGCTGAACCTCACTTTGCCGCAGGGAGGCACTGAACAGAGAAACTGCCTTG
TAACAGGTCCCGCCCCTCTCTCTACTCCCTTTCTTATATCAAGAGGGGAAAAACACGGAACTATCTCTAT
CCATTCTGGTCACCATTCGCCTATTACCTCTACTGTTACAAATACCGGATCACCCTCCGGGAGAAGATGC
TGCCTTGTTGTTACAAAAGCATCACTTACAAGGAACAGGAGGACCTGACTCTCCGGCCCCATTGCTGCCT
CCCGTGCTCCTGCCTCCCGTGCTCCTGCCTCCAGTGCTCCTGAGTCCCTAGGAGGCCTCCAGGTGGGTAG
GAGCACTGCACAGGAAAAAGACCACAGCCAGCTTAAAGAACTCTATTCAGCTGGGAACCTGACAGTGCTA
TCAACTGACCCCCTGCTTCACCAAGATCCAGTTCAGTTAGACTTCCACTTTCGTCTTACCCCCCATTCCT
CTGCTCATTGGCACGGCCTTCTGTGTGATCACCGACTCTTCCTGGATATCCCATATCAGGCCTTGGATCA
AGGCAACCGAGAAAGCTTGACAGCAACACTGGAGTATGTGGAGGAGAAAACCAATGTGGACTCTGTGTTT
GTGAACTTCCAAATCGATCGGAAGGACAGAGGTGCCCTGCTGCGAGCCTTTAGCTACATGGGCTTCGAGG
TGGTTAGACCAGATCATCCTGCCCTCCCTCCCTGGGACAATGTCATCTTCATGGTGTATCCCCTTGAAAG
GGACCTTGGCCACCCTGGCCAGTGAGCCTCCCTAAACATGTTCCATCTCTGTGAGGGGTTGGAAACCTCA
ACACACGGGACTCTGAGGCCCAGGATGTGATTTAAGATACTTCCATCCTAGGAAATAAAGGGTAGTGCAA
TC

Your data set should include at least one distinct DNA or RNA sequence and the sequence should be in FASTA format. A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column. Lower-case and upper-case letters are both accepted. The full standard IUPAC nucleic acid code is not supported: only A, C, G, T and U symbols are recognized. Numerical digits 0, ..., 9, - and dot . symbols are not accepted.

Example of command-line:
>./oaf.pl --format=genbank --sequence=myseq.fasta

Example of result:

     gene            207..795
                     /locus_tag="my_sequence"
                     /gene="OAZ3"
     CDS             join(207..320,322..795)
                     /gene="OAZ3"
                     /locus_tag="my_sequence"
                     /note="OAZ3 ORF0: 1.3e-18 ORF1: 9.9e-129"
                     /inference="FS site: 1"
                     /ribosomal_slippage
                     /codon_start=1
                     /transl_table=1
                     /product="Ornithine decarboxylase antizyme 3"
                     /translation="MLPCCYKSITYKEQEDLTLRPHCCLPCSCLPCSCLQCSESLGGL
                     QVGRSTAQEKDHSQLKELYSAGNLTVLSTDPLLHQDPVQLDFHFRLTPHSSAHWHGLL
                     CDHRLFLDIPYQALDQGNRESLTATLEYVEEKTNVDSVFVNFQIDRKDRGALLRAFSY
                     MGFEVVRPDHPALPPWDNVIFMVYPLERDLGHPGQ"

For each sequence, Ornithine decarboxylase Antizyme Finder (OAF) returns a putative Ornithine decarboxylase Antizyme (OAZ). For each sequence, the result is formatted as a FASTA format, a GenBank entry fragment, a raw sequence or an XML record with a Document Type Definition (DTD).

If no OAZ is detected, then the message No Hit is displayed.

Advanced Query

Accessing sequence data from the GenBank databases is straightforward. Data can be accessed by means of the sequence's accession number or id. For retrieving data from genbank, for example, the command could be as follows:
>./oaf.pl --format=xml --genbank=NM_004152

Example of result:

<oafxml version="1.2">
 <analysis>
  <program>
   <prog-name>OAF.pm</prog-name>
   <prog-version>1.0.1</prog-version>
  </program>
  <date>
   <day>8</day>
   <month>3</month>
   <year>2007</year>
  </date>
  <parameter>
   <evalue>1e-40</evalue>
   <table>1</table>
  </parameter>
 </analysis>
 <sequence id="NM_004152.seq">
  <input>
   <seq type="dna" length="1146">TTT [cut] CGA</seq>
  </input>
  <output id="NM_004152">
   <gene id="NM_004152.1">
    <coord>79..766</coord>
    <name>OAZ1</name>
    <seq type="dna" length="688">ATG [cut] TAG</seq>
   </gene>
   <cds id="NM_004152.2">
    <coord>join(79..282,284..766)</coord>
    <name>OAZ1</name>
    <note>OAZ1 ORF0: 4.8e-55 ORF1: 4.8e-136</note>
    <product>Ornithine decarboxylase antizyme 1</product>
    <seq type="prt" length="228">MVK [cut] EEE</seq>
    <model hmm="OAZ1_ORF1">4.8e-136</model>
    <model hmm="OAZ1_ORF0">4.8e-55</model>
   </cds>
   <frameshift id="NM_004152.3" evalue="0.039">
    <psite>TCC</psite>
    <asite>TGA</asite>
    <downstream>T</downstream>
   </frameshift>
  </output>
 </sequence>
</oafxml>

More complex Query

Very large databases present special problems to automated projects. Bioperl's ortholog mode addresses this situation. The ortholog mode is a compliant process that retrieves only a subset of sequences before running the OAZ search. The aim is to enable search in very large sequences database without running out of memory and, at the same time, preserving the usability. As a result, from the user's perspective, using the ortolog mode is almost identical to using the other modes. The principal difference is in the input file must be an amino acid sequence file. Another difference is that input sequence is used as a query against a database of nucleic acids. These differences are illustrated in the following example:

Example of FASTA format:

>a_known_oaz_protein
MINTQDSSILPLSKCPQLQCCRHIVPGPLWCSDAPHPLSKIPGGRGGGRD
PSLSALIYKDEKLTVTQDLPVNDGKPHIVHFQYEVTEVKVSSWDAVLSSQ
SLFVEIPDGLLADGSKEGLLALLEFAEEKMKVNYVFICFRKGREDRAPLL
KTFSFLGFEIVRPGHPCVPSRPDVMFMVYPLDQNLSDED

Example of command-line:
>./oaf.pl --expect=2e-5 -r --bank=refseq_rna --orhto=example.aa

Example of result:

>gi|93141218|ref|NM_008753.4|OAZ1
MVKSSLQRILNSHCFAREKEGDKRSATLHASRTMPLLSQHSRGGCSSESSRVALNCCSNLGPGPRWCSDVPHPPLKIPGG
RGNSQRDHSLSASILYSDERLNVTEEPTSNDKTRVLSIQSTLTEAKQVTWRAVWSGGGLYIELPAGPLPEGSKDSFAALL
EFAEEQLQADHVFICFPKNREDRAALLRTFSFLGFEIVRPGHPLVPKRPDACFMVYTLEREDPGEED

>gi|37596299|ref|NM_010952.2|OAZ2
MINTQDSSILPLSKCPQLQCCRHIVPGPLWCSDAPHPLSKIPGGRGGGRDPSLSALIYKDEKLTVTQDLPVNDGKPHIVH
FQYEVTEVKVSSWDAVLSSQSLFVEIPDGLLADGSKEGLLALLEFAEEKMKVNYVFICFRKGREDRAPLLKTFSFLGFEI
VRPGHPCVPSRPDVMFMVYPLDQNLSDED

>gi|8567381|ref|NM_016901.1|OAZ3
MLPCCYKSITYKEQEDLTLRPHCCLPCSCLPCSCLQCSESLGGLQVGRSTAQEKDHSQLKELYSAGNLTVLSTDPLLHQD
PVQLDFHFRLTPHSSAHWHGLLCDHRLFLDIPYQALDQGNRESLTATLEYVEEKTNVDSVFVNFQIDRKDRGALLRAFSY
MGFEVVRPDHPALPPWDNVIFMVYPLERDLGHPGQ

...

As result, oaf.pl uses the NCBI remote blast server, and the 'refseq_rna' database to retrieve the putative ortholog sequences. Set the expected value (E-value) at 2e^-5.