hyperfreq
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Bayesian tool for APOBEC hypermutation analysis
# hyperfreq

A Bayesian [APOBEC3G](http://en.wikipedia.org/wiki/APOBEC3G)-induced hypermutation analysis tool implemented in Python.


## CLI usage

The following examples illustrate some of the basic usage of hyperfreq.
Note that if one navigates to the root of this code base, these commands should all execute using the test data in `tests/data`.

### Analysis

To run an analysis, use the `hyperfreq analyze` command.
Running `hyperfreq analyze -h` will give you a full list of options.
Here are some examples to get you started and give you a rough sense of hyperfreq's capabilities.

    # Simple analysis comparing each sequence to a consensus sequence constructed from the entire alignment
    hyperfreq analyze tests/data/alignment.fasta

    # Instead, compare each sequence to the consensus for a cluster specified in a clusters file
    hyperfreq analyze tests/data/alignment.fasta -c tests/data/clusters.csv

    # Specify the reference sequence(s) you want each sequence to be compared to
    hyperfreq analyze tests/data/alignment.fasta -r tests/data/ref_seqs.fasta

These commands will all output data to a file named `hyperfreq_analysis.call.csv`.
The prefix and location of this file can be specified using the `-P` and `-o` flags.

### Multiple pattern analysis

By default, these analyses look for GG context hypermutation, suggestive of APOBEC3G activity.
One can specify multiple contexts for analysis using the `-p` or `--patterns` flag(s).
Pattern options include

 **pattern** | **associated with**
------------ | ------------------------------------------------------------------------
        `GG` | A3G activity
        `GA` | A3F (and other A3) activity in humans
        `GR` | combined A3G and A3F activity (as often observed in hypermutated HIV
        `GM` | rhesus macaque A3DE activity (as observed in XMRV and SFV infections) \*
        `GV` | combined rhesus A3DE and A3G activity

Note that R, M and V are IUPAC degenerate codes for A or G; A or C; and A, C or G, respectively.

When running multiple patterns, the `call.csv` file contains a column called `call_pattern` which represents the pattern in which the evidence of hypermutation appears to be strongest.
Other data in the `call.csv` file will contain counts, and statistics specifically for the pattern considered the call pattern.
If the `-F/--full-output` flag is specified, a separate file is output for each pattern analyzed (for example `hyperfreq_analysis.GG.csv`).

\* [Zhang et al](http://www.sciencedirect.com/science/article/pii/S0042682211004375)

   
### Splitting sequences for HM free alignments

Given an alignment, we can cut out sites/columns suspected of hypermutation by using the `split` command.
Running this command requires specifying an alignment and a CSV file with a column named `column`, specifying which positions in the alignment to be cut out.
In addition to the `hyperfreq_analysis.call.csv` file, `hyperfreq analyze` also produces a `hyperfreq_analysis.sites.csv` file which has such a column, as well as information regarding which sequences were hypermutated at which sequence positions.

For example,

    # for an alignment with hypermutated columns removed
    hyperfreq split alignment.fasta hypermutated_columns.csv

For more thorough usage, run `hyperfreq split -h` at the command line.


## Library usage

If you want to write your own scripts, you can do so by importing the appropriate modules

    from hyperfreq import Alignment
    from Bio import SeqIO

    # Create a hyperfreq alignment object
    seqs = SeqIO.parse('some_file.fasta', 'fasta')
    aln = Alignment(seqs)

    # Obtain an analysis generator which can be iterated over.
    analysis = aln.analyze()

    # Iterate over each sequence in the analysis, and do whatever you like!
    for seq_result in analysis:
        print seq_result['sequence'], "hm status:", seq_result['hm_pos']

It's also possible to define your own mutation patterns using the `MutPattern` and `MutPatternSet` classes.
It may be possible in the future to more flexibly specify patterns more flexibly via the CLI, but for now, doing so requires using this code base as a library in writing your own scripts.
If having this functionality available via the command line is important to you, please submit an [issue](https://github.com/fhcrc/hyperfreq/issues), and we'll see what we can do.


## Interactive usage

It is also now possible to run a `hyperfreq analyze` command with the `-N / --interactive` flag.
This causes the program to load up an interactive python session with the `analysis` generator, instead of simply writing the results to file.
You can enter `dir()` to see what namespaces and data have been included for you, and obviously load any libraries (numpy, biopython, etc.) that might be helpful.
Do note though that `analysis` is a generator, so if you need to make more than one pass through the results interactively, you'll need to throw them into an list.

    analysis = [result for result in analysis]


## Installing

Currently, Hyperfreq has only been tested on Linux systems, but it should be possible to get it set up on OSX fairly easily.
If you do get it set up on OSX or Windows and have any tips to share, please feel free to add a page to [the wiki](https://github.com/fhcrc/hyperfreq/wiki).

### Pre-requisites

Hyperfreq depends on the following python libraries

* `biopython`
* `fisher`
* [`betarat`](https://github.com/fhcrc/betarat)
* [`alnclst`](https://github.com/fhcrc/alnclst)

We recommend installing betarat and alnclst first, using the directions in the links above.
Since alnclst requires biopython, you should now only have to install fisher.
Assuming you have pip and the python-dev libraries installed (which you should after following the instructions above), you should be able to run

    # (sudo may not be necessary, depending on how you set up your python environment)
    sudo pip install fisher

### With that out of the way...

Now you should be able to download and install hyperfreq

    # I like to download things to a src directory in my home folder
    mkdir -p ~/src
    cd ~/src

    # Download and unzip the source code
    wget https://github.com/fhcrc/hyperfreq/archive/betarat-refactor.zip -O hyperfreq.zip
    unzip hyperfreq.zip
    cd hyperfreq

    # Install (sudo may not be necessary, depending on how you set up your python environment)
    sudo python setup.py install

And there you have it!
You can try running `hyperfreq -h` from the command line to test your installation.

### Problems?

If you have any trouble installing, please submit an [issue](https://github.com/fhcrc/hyperfreq/issues), so we can try to help and update the documentation.



本源码包内暂不包含可直接显示的源代码文件,请下载源码包。