生物信息学好文章 new fast and versatile tools for detection of consensus matches in nucleotide sequence data
文件大小:
1187k
资源说明:The identification of potential regulatory motifs in new
sequence data is increasingly important for experimental
design. Those motifs are commonly located by
matches to IUPAC strings derived from consensus
sequences. Although this method is simple and widely
used, a major drawback of IUPAC strings is that they
necessarily remove much of the information originally
present in the set of sequences. Nucleotide distribution
matrices retain most of the information and are
thus better suited to evaluate new potential sites.
However, sufficiently large libraries of pre-compiled
matrices are a prerequisite for practical application of
any matrix-based approach and are just beginning to
emerge. Here we present a set of tools for molecular
biologists that allows generation of new matrices and
detection of potential sequence matches by automatic
searches with a library of pre-compiled matrices. We
also supply a large library (>200) of transcription factor
binding site matrices that has been compiled on the
basis of published matrices as well as entries from the
TRANSFAC database, with emphasis on sequences
with experimentally verified binding capacity. Our
search method includes position weighting of the
matrices based on the information content of individual
positions and calculates a relative matrix similarity.
We show several examples suggesting that this matrix
similarity is useful in estimating the functional
potential of matrix matches and thus provides a
valuable basis for designing appropriate experiments.
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。