biotools
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:A bunch of bioinformatics tools.
#`biotools`

A bunch of bioinformatics utilities.

###`biotools.align`

This module is used to align sequences. Currently, there is only a single
alignment algorithm implementented; it is a hybrid between Needleman-Wunsch
and Smith-Waterman and is used to find the subsequence within a larger sequence
that best aligns to a reference.

####`biotools.align.OptimalCTether(reference, translation, extend=1, create=10)`

This function will take two sequences: a `reference` sequence and  another
protein sequence (`translation`; usually, this is an open reading frame
that has been translated). Needleman-Wunsch alignment will be performed
and the substring of translation with the highest identity that begins
with a start codon [default: `['ATG']`] is reported.

This function returns a dictionary of relevent information from the
alignment; specifically, the alignments itself [keys: `query`, `subject`],
the score [key: `score`], the length of the alignment [key: `length`], the
length of the substring of translation used [key: `sublength`], the number
of identities [key: `identities`], and the number of gaps [key: `gaps`].

###`biotools.annotation`

This module is used to create annotation files (currently, only GFF files).
The annotations can be used to create a heirarchy among the annotations (e.g.,
genes contain exons, introns, ... etc.).

####`biotools.annotation.Annotation(self, ref, src, type, start, end, score, strand, phase, attr, name_token='ID', gff_token='=')`

An object to help with reading and writing GFF files.

###`biotools.BLAST`

A module to manage BLAST databases and interface with the BLAST+ standalone
program available from NCBI.

####`biotools.BLAST.Result(self, file)`

A class which take the raw output from BLAST and generates dictionaries
from the data from BLAST. This data includes the alignment, percent
identity, gaps, e-value, score, length of subject, length of query, and
start and stop positions for both sequences. This class should be used in 
a for loop like so:

```python
    for res in Result(file_or_data):
        pass
```

The class instance has a single other property, `headers`, which are the
lines in BLAST results before the BLAST hits (e.g., citation info, etc.).

####`biotools.BLAST.run(db, sfile, mega_blast=False, **kwargs)`

Takes a database and a query and runs the appropriate type of BLAST on 
them. The database can be an existing BLAST database or a fasta/fastq 
file. If it is a sequence file, this function will look in the places 
where BLAST would look for an existing database created from that file and 
use that instead. If there is no such database, this function will make 
one for you and then use the newly created database with BLAST.

Optional named arguments can currently only be `evalue`, `num_threads`, 
`gapopen`, or `gapextend`. The correspond to the BLAST options of the same 
name.

###`biotools.clustal`

**Needs documentation**

####`biotools.clustal.run(infile, outfile, **kwargs)`

**Needs documentation**

###`biotools.complement`

**Needs documentation**

####`biotools.complement.complement(s)`

Creates the complement of a sequence, which can then be reversed by using
`seq[::-1]`, if it needs to be reversed. This function accepts either
`Sequence`s or strings.

###`biotools.sequence`

**Needs documentation**

####`biotools.sequence.annotation(seq, source, type, **kwargs)`

Creates an `Annotation` object for the given sequence from a source
(e.g., "phytozome7.0") of a particular type (e.g., "gene").

####`biotools.sequence.chop(seq, length=70)`

Yields a chunk of a sequence of no more than `length` characters,
it is meant to be used to print fasta files.

####`biotools.sequence.Sequence(self, name, seq, **kwargs)`

A wrapper class for sequences.

#####`biotools.sequence.Sequence.upper(self)`

**Needs documentation**

###`biotools.translate`

**Needs documentation**

####`biotools.translate.translate(sequence)`

Translate a nucleotide using the standard genetic code. The sequence
parameter can be either a string or a `Sequence` object. Stop codons are
denoted with an asterisk (*).

##`biotools.analysis`

**Needs documentation**

###`biotools.analysis.cluster`

**Needs documentation**

####`biotools.analysis.cluster.run(direc, inputs)`

Takes a collection of files generated by gene prediction, creates clusters
based off of the genes that have homology to those predicted genes, and
creates new fasta files in the clusters sub directory under the given
directory and separated according to whether they are nucleotide or amino
acid sequnces. These new fasta files are then used to create clustalw
alignments of the genes if more than 1 sequence exists in the fasta file.

###`biotools.analysis.loaddata`

This is a pretty simple JSON-like parser. Specifically, it can load Python-like
object, list, and other literals, i.e., the sort of stuff you'd get it you
dumped the the string representation of some data into a file.

The real difference is that you must specify a variable name, e.g.:

```python
my_stuff = { ... }
```

These variable names don't need to be on a newline or anything like that, you
should be able to omit any and all whitespace. The result of a successful 
parse is a dictionary:

```python
{'my_stuff': { ... }}
```

This function really only works for `None`, `True`, `False`, numbers, strings, 
dictionaries, and lists.

####`biotools.analysis.loaddata.parse(ipt)`

**Needs documentation**

###`biotools.analysis.options`

**Needs documentation**

####`biotools.analysis.options.debug(msg)`

**Needs documentation**

####`biotools.analysis.options.help()`

Prints the usage.

####`biotools.analysis.options.parse(pargs)`

Parses `pargs` and sets global variables to be accessible to other
modules.

These variables are:
* `LENGTH_ERR`
* `MIN_IDENTITY`
* `MAX_EVALUE`
* `NUM_THREADS`
* `NUM_PROCESSES`
* `START_CODONS`
* `START_CODONS`
* `DIRECTORY`
* `PLOTTER`
* `args`

###`biotools.analysis.plot`

**Needs documentation**

####`biotools.analysis.plot.axes(bottom, side, bound, fig, **kwargs)`

**Needs documentation**

####`biotools.analysis.plot.draw(x, y, ax, color, **kwargs)`

**Needs documentation**

####`biotools.analysis.plot.models(starts, ends, counts, bound, ax, **kwargs)`

**Needs documentation**

####`biotools.analysis.plot.plot(plotdata, directory, bottom=True, side=True, legend=True, save=True, filename='untitled.pdf', upperbound=0.05, factor=21, fig=, **kwargs)`

**Needs documentation**

####`biotools.analysis.plot.report(ntvar, aavar, lnt, laa)`

**Needs documentation**

####`biotools.analysis.plot.smoothed(unsmoothed, factor)`

**Needs documentation**

###`biotools.analysis.predict`

**Needs documentation**

####`biotools.analysis.predict.GeneFromBLAST(db, sequences, pref, names)`

BLASTs database against sequences, and for those results that pass the
length and percent identity requirements, attempt to locate the full gene
that corresponds to that BLAST hit. Genes that are found are saved in the
subdirectory sequences under the given directory, divided depending on
whether the sequnece is amino acid or nucleotide.

####`biotools.analysis.predict.ORFGenerator(sequ)`

Scans both strands of the given sequence and yields the longest subsequence
that starts with a start codon and contains no stop codon other than the
final codon.

####`biotools.analysis.predict.run(subject, query, prefix, names)`

**Needs documentation**

####`biotools.analysis.predict.ThreadQueue`

**Needs documentation**

###`biotools.analysis.renamer`

**Needs documentation**

####`biotools.analysis.renamer.rename(direc, db, files)`

This isn't really for bioinformatics, this is more for the pipeline, to
rename the files generated by cluster.py with a little human interaction.

###`biotools.analysis.report`

**Needs documentation**

####`biotools.analysis.report.plot(plotdata, directory, bottom=True, side=True, legend=True, save=True, filename='untitled.pdf', upperbound=0.05, factor=21, fig=, **kwargs)`

**Needs documentation**

####`biotools.analysis.report.report(plotdata, **kwargs)`

**Needs documentation**

###`biotools.analysis.run`

**Needs documentation**

####`biotools.analysis.run.run(infile, strains)`

Run several instances of `genepredict.run` at once.

###`biotools.analysis.variance`

**Needs documentation**

####`biotools.analysis.variance.SaySNPs(input)`

Takes a clustalw alignment and will return a dictionary of data
relevent to plotting the sequence variance for the sequences in the
given clustalw alignment. These data are:
* `var`: the measure of sequence variation,
* `starts`: the starting positions for each gene model in amino acids,
* `ends`: the ending positions for each gene model in amino acids, and
* `count`: the number of sequences with a particular gene model.
The values given in `starts`, `ends`, and `counts` are sorted to that the 
nth element in starts corresponds to the nth value in ends and the nth 
value in counts.

####`biotools.analysis.variance.var(strain, fmt)`

Returns plot data and metadata for plotting later on in the pipeline.

##`biotools.IO`

A module for reading and writing to sequence and annotation files. Currently
supported file types are: FASTA, FASTQ, CLUSTAL alignments, and GFF3 files.

###`biotools.IO.clustal`

Methods for manipulating clustalw alignment files.

####`biotools.IO.clustal.probe(fh)`

**Needs documentation**

####`biotools.IO.clustal.read(fh)`

**Needs documentation**

####`biotools.IO.clustal.rhook(fh)`

**Needs documentation**

###`biotools.IO.fasta`

Functions for manipulating FASTA files.

####`biotools.IO.fasta.probe(fh)`

Probe a file to determine whether or not it is a FASTA file. That is,
the first non-empty line should begin with a caret ('>'). If no caret is
found on the first line, then we conclude that it is not a FASTA file
and return False, otherwise, we return a dictionary with information
relevant to the FASTA file type.

####`biotools.IO.fasta.read(fh)`

Read sequences in FASTA format; identifiers (names and definition lines) 
are on lines that begin with carets ('>') and sequence is on lines that 
intervene between the carets. This function is a generator that yields
`Sequence` objects.

####`biotools.IO.fasta.write(fh, s)`

Write sequences in FASTA format, i.e.,

```
>name defline
sequence ...
```

Sequences are wrapped to 70 characters by default.

###`biotools.IO.fastq`

Functions for manipulating FASTQ files.

####`biotools.IO.fastq.probe(fh)`

Probe a file to determine whether or not it is a FASTQ file. That is,
the first non-empty line should begin with a caret ('@') and the 3rd line
following that first non-empty line should contain no character with
ordinal value less than 32. If none of the characters have ordinal value
less than 64, then the file is guessed to be encoded in Phred64, otherwise
it is encoded in Phred32. This function will return False if the file is
not in FASTQ format and will return a dictionary with the phred score and
type ('fastq') if the file is FASTQ.

####`biotools.IO.fastq.read(fh)`

Read sequences in FASTQ format; identifiers are on lines that begin with 
at symbols ('@'), sequence follows on the next line, then a line that 
begins and sequence is with a plus sign ('+') and finally the quality 
scores on the subsequent line. Quality scores are encoded in Phred format,
the type of which (either 32 or 64) is determined when the file is probed
for opening. The scores are decoded into a list of integers. This function 
is a generator that yields `Sequence` objects.

####`biotools.IO.fastq.write(fh, s)`

Write sequences in FASTA format, i.e.,

```
@name
sequence ...
+
quality scores
```

###`biotools.IO.get_methods()`

**Needs documentation**

###`biotools.IO.gff`

**Needs documentation**

####`biotools.IO.gff.probe(fh)`

**Needs documentation**

####`biotools.IO.gff.read(fh)`

**Needs documentation**

####`biotools.IO.gff.whook(fh)`

**Needs documentation**

####`biotools.IO.gff.write(fh, a)`

**Needs documentation**

###`biotools.IO.IOBase(self, name, mode)`

Generic IO class for sequence files.

####`biotools.IO.IOBase.close(self)`

Close the file handle.

####`biotools.IO.IOBase.format(self, fmt)`

Forces a file to be parsed as a particular format. By default, the
values for fmt can be any recognized format.

###`biotools.IO.manager`

This module is home to the IOManager class, which manages the various input
and output formats (specifically, FASTA, FASTQ, CLUSTAL alignments, and GFF
files, currently).

####`biotools.IO.manager.IOManager(self, methods=None)`

A class used by the `IOBase` class to manage the various input and output
methods for the different file types. Additional file types can be added 
to the manager by using 

```python
manager[format] = methods
```

From the above example, `methods` is a dictionary with keys `rhook`, 
`read`, `whook`, `write`, and `probe`. Each of the values must be callable
object:
* `rhook` => takes a file handle opened for reading; called before reading
    of the file has begun,
* `whook` => takes a file handle opened for writing; called before writing
    to the file has begun,
* `read`  => takes a file handle opened for reading; should be a generator
    that yields entries,
* `write` => takes a file handle opened for writing and a single entry;
    writes the entry to the file,
* `probe` => takes a file handle opened for reading; returns a dictionary
    of attributes to be applied to the `IOBase` instance.

This class behaves similarly to a dictionary, except that the get method
will default to the default method (which does nothing) if no truthy
second parameter is passed.

#####`biotools.IO.manager.IOManager.get(self, key, default=None)`

Try to get a set of methods via format (e.g., 'fasta') or fall-back
to the default methods (which do nothing).

###`biotools.IO.open(filename, mode='r')`

Open a file for parsing or creation. Returns either a Reader or Writer
object, depending on the open mode.

###`biotools.IO.Reader(self, filename, mode='r')`

A class that wraps IOBase and restricts the ability to write.

####`biotools.IO.Reader.next(self)`

Reads a single entry in the file and returns it.

####`biotools.IO.Reader.read(self, n=None)`

If `n` is provided, the next (up to) `n` entries are parsed and 
returned. Otherwise, all remaining entries are parsed and returned.

###`biotools.IO.Writer(self, filename, mode='w')`

A class that wraps IOBase and restricts the ability to read.

####`biotools.IO.Writer.write(self, sequence)`

Writes sequence as the correct format to the file.

#`prok-geneseek`

```
Usage: prok-geneseek [options]  

Options:
  -h, --help            show this help message and exit
  -S START, --start=START
                        define a start codon [default: -S ATG]
  -E STOP, --stop=STOP  define a stop codon [default: -E TAG -E TAA -E TGA]
  -j THREADS, --threads=THREADS
                        number of threads [default: 16]
  -p PROCESSES, --processes=PROCESSES
                        number of parallel processes to run [default: 2]
  -e EVALUE, --evalue=EVALUE
                        maximum e-value [default: 1e-30]
  -I IDENTITY, --identity=IDENTITY
                        minimum percent identity [default: 0.45]
  -L FRACTION, --length=FRACTION
                        allowable relative error in hit length [default: 0.2]
  -O bases, --orflen=bases
                        minimum allowable length for ORFs [default: 300]
  -d DIRECTORY, --directory=DIRECTORY
                        set working directory [default: current]
  -P PLOTTER, --plotter=PLOTTER
                        plotting module [default: biotools.analysis.plot]
  -v, --verbose         print debug messages [default: False]
  --no-plots            suppress the drawing of plots [default: False]
  --no-predict          don't predict genes, instead treat the input files as
                        predicted genes [default: False]
  --no-cluster          don't cluster the sequences, instead treat the input
                        files as alignments [default: False]
  --no-rename           don't rename the fasta and clustal files [default:
                        False]
  --no-reports          don't generate files for variance data [default:
                        False]
  --no-calculation      don't calculate sequence variance [default: False]
```

#`grepseq`

```
Usage: grepseq [options]  

Options:
  -h, --help            show this help message and exit
  -c, --count           Suppress normal output; instead print a count of
                        matching lines for each input file. With the -v,
                        --invert-match option (see below), count non-matching
                        lines.
  -H, --with-filename   Print the filename for each match.
  -i, --ignore-case     Ignore case distinctions in both the pattern and
                        input files.
  -m NUM, --max-count=NUM
                        Stop reading a file after NUM matching lines. When
                        the -c or --count option is also used, grepseq does
                        not output a count greater than NUM. When the -v or
                        --invert-match option is also used, grep stops after
                        outputting NUM non-matching lines.
  -N, --names-only      Search only sequence names. Cannot be used with -S.
  -S, --sequences-only  Search only sequences. Cannot be used with -N.
  -v, --invert-match    Invert the sense of matching, to select non-matching
                        lines.
```

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。