ulla
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:A new ESST generator
= ulla

http://www-cryst.bioc.cam.ac.uk/ulla


== Description

'ulla' is a program for calculating environment-specific substitution tables from user providing environmental class definitions and sequence alignments with the annotations of the environment classes.


== Features

* Environment-specific substitution table generation based on user providing environmental class definition
* Entropy-based smoothing procedures to cope with sparse data problem
* BLOSUM-like weighting procedures using PID threshold
* Heat Map generation for substitution tables


== Requirements

* ruby 1.8.7 or above (1.9.0 or above recommended, http://www.ruby-lang.org)
* rubygems 1.2.0 or above (http://rubyforge.org/projects/rubygems)

Following RubyGems will be automatically installed if you have rubygems installed on your machine

* narray (http://narray.rubyforge.org)
* bio (http://bioruby.open-bio.org)
* RMagick (http://rmagick.rubyforge.org)


== Installation

    ~user $ sudo gem install ulla


== Basic Usage

It's pretty much the same as Kenji's subst (http://mordred.bioc.cam.ac.uk/~kenji/subst/), so in most cases, you can swap 'subst' with 'ulla'.

    ~user $ ulla -l TEMLIST-file -c classdef.dat
                or
    ~user $ ulla -f TEM-file -c classdef.dat


== Options
    --tem-file (-f) FILE: a tem file
    --tem-list (-l) FILE: a list for tem files
    --classdef (-c) FILE: a file for the defintion of environmental class
                          if no definition file provided, --cys (-y) 2 and --nosmooth options automatcially applied
    --outfile (-o) FILE: output filename (default 'allmat.dat')
    --weight (-w) INTEGER: clustering level (PID) for the BLOSUM-like weighting (default: 60)
    --noweight: calculate substitution counts with no weights
    --environment (-e) INTEGER:
        0 for considering only substituted amino acids' environments (default)
        1 for considering both substituted and substituting amino acids' environments
    --smooth (-s) INTEGER:
        0 for partial smoothing (default)
        1 for full smoothing
    --p1smooth: perform smoothing for p1 probability calculation when partial smoothing
    --nosmooth: perform no smoothing operation
    --cys (-y) INTEGER:
        0 for using C and J only for structure (default)
        1 for both structure and sequence
        2 for using only C for both (must be set when you have no 'disulphide' or 'disulfide' annotation in templates)
    --output INTEGER:
        0 for raw counts (no smoothing performed)
        1 for probabilities
        2 for log-odds (default)
    --noroundoff: do not round off log odds ratio
    --scale INTEGER: log-odds matrices in 1/n bit units (default 3)
    --sigma DOUBLE: change the sigma value for smoothing (default 5.0)
    --autosigma: automatically adjust the sigma value for smoothing
    --add DOUBLE: add this value to raw counts when deriving log-odds without smoothing (default 0)
    --pidmin DOUBLE: count substitutions only for pairs with PID equal to or greater than this value (default none)
    --pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
    --heatmap INTEGER:
        0 create a heat map file for each substitution table
        1 create one big file containing all heat maps from substitution tables
        2 do both 0 and 1
    --heatmap-format INTEGER:
        0 for Portable Network Graphics (PNG) Format (default)
        1 for Graphics Interchange Format (GIF)
        2 for Joint Photographic Experts Group (JPEG) Format
        3 for Microsoft Windows bitmap (BMP) Format
        4 for Portable Document Format (PDF)
    --heatmap-columns INTEGER: number of tables to print in a row when --heatmap 1 or 2 set (default: sqrt(no. of tables))
    --heatmap-stem STRING: stem for a file name when --heatmap 1 or 2 set (default: 'heatmap')
    --heatmap-values: print values in the cells when generating heat maps
    --verbose (-v) INTEGER
        0 for ERROR level
        1 for WARN or above level (default)
        2 for INFO or above level
        3 for DEBUG or above level
    --version: print version
    --help (-h): show help


== Usage

1. Prepare an environmental class definition file. For more details, please check this notes (http://mordred.bioc.cam.ac.uk/~kenji/subst/NOTES). You can download a sample environmental class definition file from http://mordred.bioc.cam.ac.uk/~kenji/subst/classdef.dat 

        ~user $ cat classdef.dat
        #
        # name of feature (string); values adopted in .tem file (string); class labels assigned for each value (string);
        # constrained or not (T or F); silent (used as masks)? (T or F)
        #
        secondary structure and phi angle;HEPC;HEPC;T;F
        solvent accessibility;TF;Aa;F;F

2. Prepare structural alignments and their annotations of above environmental classes in PIR format. You can download sample alignments from http://mordred.bioc.cam.ac.uk/~kenji/subst/alltem-allmask.tar.gz or from http://www-cryst.bioc.cam.ac.uk/ESST/

        ~user $ cat sample1.tem
        >P1;1mnma
        sequence
        QKERRKIEIKFIENKTRRHVTFSKRKHGIMKKAFELSVLTGTQVLLLVVSETGLVYTFSTPKFEPIVTQQEGRNL
        IQACLNAPDD*
        >P1;1egwa
        sequence
        --GRKKIQITRIMDERNRQVTFTKRKFGLMKKAYELSVLCDCEIALIIFNSSNKLFQYASTDMDKVLLKYTEY--
        ----------*
        >P1;1mnma
        secondary structure and phi angle
        CPCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHPCCCEEEEECCCPCEEEEECCCCCHHHHCHHHHHH
        HHHHHCCCCP*
        >P1;1egwa
        secondary structure and phi angle
        --CCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCPCCCEEEEECCCPCEEEEECCCHHHHHHHHHHC--
        ----------*
        >P1;1mnma
        solvent accessibility
        TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTT
        TTTTTTTTTT*
        >P1;1egwa
        solvent accessibility
        --TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT--
        ----------*
        ... 

3. When you have two or more alignment files, you should make a separate file containing all the paths for the alignment files.

        ~user $ ls -1 *.tem > TEMLIST
        ~user $ cat TEMLIST
        sample1.tem
        sample2.tem
        ...

4. To produce substitution count matrices,

        ~user $ ulla -l TEMLIST --output 0 -o substcount.mat

5. To produce substitution probability matrices,

        ~user $ ulla -l TEMLIST --output 1 -o substprob.mat

6. To produce log odds ratio matrices,

        ~user $ ulla -l TEMLIST --output 2 -o substlogo.mat

7. To produce substitution probability matrices from the sequence pairs within a certain PID range (if you don't provide any name for output, 'allmat.dat' will be used.),

        ~user $ ulla -l TEMLIST --pidmin 60 --pidmax 80 --output 1

8. To change the clustering level (default 60) to PID 80,

        ~user $ ulla -l TEMLIST --weight 80 --output 1

9. In case positions are masked with the character 'X' in any environmental features, all mutations from/to the position will be excluded from substitution counts.

10. Then, it will produce a file containing all the matrices, which will look like the one below. For more details, please check this notes (http://mordred.bioc.cam.ac.uk/~kenji/subst/NOTES).

        # Environment-specific amino acid substitution matrices
        # Creator: ulla version 0.0.5
        # Creation Date: 05/02/2009 17:29
        #
        # Definitions for structural environments:
        # 2 features used
        #
        # secondary structure and phi angle;HEPC;HEPC;F;F
        # solvent accessibility;TF;Aa;F;F
        # (read in from classdef.dat)
        #
        # Number of alignments: 1187
        # (list of .tem files read in from TEMLIST)
        #
        # Total number of environments: 8
        #
        # There are 21 amino acids considered.
        # ACDEFGHIKLMNPQRSTVWYJ
        # 
        # C: Cystine (the disulfide-bonded form)
        # J: Cysteine (the free thiol form)
        #
        # Weighting scheme: clustering at PID 60 level
        # ...
        #
        >HA 0
        #        A      C      D      E      F      G      H      I      K      L      M      N      P      Q      R      S      T      V      W      Y      J
        A        3     -5      0      0     -1      2      0      0      1      0      0      0      1      1      0      1      1      1     -1      0      2
        C      -16     19    -16    -18    -11    -14    -13    -13    -14    -14    -14    -11    -17    -16    -13    -16    -14    -12    -12    -10     -4
        D        1     -7      6      3     -3      1      0     -3      1     -3     -2      2      1      2      0      1      0     -2     -3     -2     -2
        E        3     -7      5      7     -1      2      2      0      3      0      0      3      2      4      3      3      2      1     -1      0     -1
        F       -4     -4     -6     -6      7     -5     -1      0     -4      1      0     -5     -5     -4     -4     -4     -3     -1      3      3      0
        G       -2     -6     -3     -4     -5      5     -4     -5     -4     -5     -4     -2     -3     -4     -4     -2     -3     -5     -6     -4     -3
        H        0     -6      0      0      1      0      8     -1      0      0      0      1     -2      1      1      0      0      0      1      3      0
        I       -3     -7     -6     -5      0     -5     -3      4     -4      1      1     -5     -4     -4     -3     -5     -2      2     -2     -1      0
        K        2     -6      2      2     -1      1      2      0      5      1      1      2      0      3      4      2      2      0     -2      0     -1
        L       -2     -6     -5     -4      1     -4     -2      2     -3      4      2     -3     -4     -3     -2     -4     -2      1      0      0      1
        M       -2     -7     -4     -3      1     -2     -1      2     -2      2      6     -3     -4     -2     -1     -2     -1      1      0      0      1
        N        0     -5      1      0     -3      1      1     -3      0     -2     -2      6     -2      0      0      1      1     -2     -3     -1     -1
        P       -1     -7     -1     -2     -4     -1     -3     -3     -2     -3     -4     -2      9     -2     -3      0     -1     -2     -4     -4     -4
        Q        2     -7      2      2     -1      1      2     -1      2      0      0      2      0      5      2      1      1      0     -2     -1      0
        R        1     -6      1      1     -1      0      2      0      3      0      1      1     -1      2      6      1      1      0     -1      0      0
        S        0     -6     -1     -1     -3      0     -2     -3     -1     -3     -3      0      0     -1     -1      3      1     -2     -4     -3      0
        T       -1     -7     -2     -2     -3     -2     -2     -2     -2     -2     -2     -1     -2     -2     -2      0      3     -1     -3     -3      0
        V       -3     -6     -6     -5     -1     -4     -3      1     -4      0      0     -5     -3     -4     -4     -4     -2      2     -2     -2      0
        W       -4     -6     -6     -5      2     -6     -2     -2     -5     -1     -2     -5     -5     -4     -4     -5     -4     -2     12      2     -3
        Y       -3     -5     -5     -5      3     -4      1     -1     -3     -1     -1     -3     -5     -3     -3     -4     -3     -2      3      7     -1
        J       -2      0     -4     -5      0     -2     -1      0     -3      0      0     -3     -6     -2     -2     -1     -1      0     -1      0      9
        U       -5     16     -7     -8     -3     -5     -4     -3     -6     -3     -3     -5     -9     -6     -5     -4     -4     -3     -4     -3      6
        ...

11. To generate a heat map for each table with values in it,

        ~user $ ulla -l TEMLIST --heatmap 0 --heatmap-values

    which will look like this,

    http://mordred.bioc.cam.ac.uk/~semin/images/0.HA.png

12. To generate one big figure, 'myheatmaps.gif' containing all the heat maps (4 maps in a row),

        ~user $ ulla -l TEMLIST --heatmap 1 --heatmap-stem myheatmaps --heatmap-format 1 --heatmap-columns 4

    which will look like this,

    http://mordred.bioc.cam.ac.uk/~semin/images/myheatmaps.gif

== Repository

You can download a pre-built RubyGems package from

* rubyforge: http://rubyforge.org/projects/ulla

or, You can fetch the source from

* github: http://github.com/semin/ulla/tree/master

== Reference

* {Lee S., Blundell T.L. (2009) Ulla: a program for calculating environment-specific amino acid substitution tables. Bioinformatics. 25(15):1976-1977; doi:10.1093/bioinformatics/btp300}[http://bioinformatics.oxfordjournals.org/cgi/content/full/25/15/1976]

== Contact

Comments are welcome, please send an email to me (seminlee at gmail dot com). 


== License

http://i.creativecommons.org/l/by-nc/2.0/uk/88x31.png
This work is licensed under a {Creative Commons Attribution-Noncommercial 2.0 UK: England & Wales License}[http://creativecommons.org/licenses/by-nc/2.0/uk/].

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。