fminer
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Molecular feature miner for path- and tree-shaped structures from "Large Scale Graph Mining using Backbone Refinement Classes"
 _   _ ______ ______   ___   _____  _____ 
| | | || ___ \|  _  \ / _ \ |_   _||  ___|
| | | || |_/ /| | | |/ /_\ \  | |  | |__  
| | | ||  __/ | | | ||  _  |  | |  |  __| 
| |_| || |    | |/ / | | | |  | |  | |___ 
 \___/ \_|    |___/  \_| |_/  \_/  \____/ 

A NEWER VERSION OF FMINER IS AVAILABLE AT
--- http://github.com/amaunz/fminer2 ----


 Welcome to Fminer.

 This is Fminer (application), available from http://github.com/amaunz/fminer/tree/master .
 It depends on LibFminer, available from http://github.com/amaunz/libfminer/tree/master . 
 The official website with documentation is http://www.maunz.de/libfminer-doc .

 For installation and documentation see INSTALL.
 For license information see LICENSE.


 File Formats:
 =============
 Graphs are allowed in SMILES and gSpan format. This is to support both general graph mining, but to also make life more convenient when mining molecular data. For instance, for SMILES formatted input there is a switch to enable/disable aromatic ring perception, which, in gSpan format, requires re-formatting the input.
  
  SMILES file line format is defined by 
    "ID\t SMILES", e.g.:
     1    COC1=CC=C(C=C1)C2=NC(=C([NH]2)C3=CC=CC=C3)C4=CC=CC=C4

  gSpan file block format (see also: http://illimine.cs.uiuc.edu/resources/readme_plain.php?program=gSpan) is defined by:
    "t # ID" means the Nth graph,
    "v M L" means that the Mth vertex in this graph has label L,
    "e P Q L" means that there is an edge connecting the Pth vertex with the Qth vertex. The edge has label L.

- Activity file line format is defined by 
    "ID\t endpoint\t activity", e.g.:
     1    Salmonella Mutagenicity    1

 The set of IDs in a gSpan or SMILES format file must be a subset of the set of activity IDs in the respective activity format file, i.e., not every Activity ID must be matched by a graph id, but vice versa.


 Switches (run Fminer w/o arguments for allowable parameter ranges, defaults and explanations):
 ========
 Constraint parameters:
 -f Minimum frequency constraint, used for anti-monotonic pruning.
 -l Subgraph type, choices are paths and trees.
 -p Chi-square significance level, used for statistical upper-bound pruning.

 Constraint switches:
 -d Deactivate dynamic adjustment of upper bound (less efficient, use only for performance evaluation). Can only be set when -c is not set.
 -b Switch off mining of only the most significant/most general representative of each backbone.
 -m Switch on mining of maximal trees, as induced by constraints.
 -u Deactivate upper bound pruning for chi-square constraint (less efficient, use only for performance evaluation).

 General switches:
 -s Refine fragments with frequency 1.
 -a Enable aromatic ring perception for SMILES input.
 -r Insert empty vectors in result vector to indicate BBRC borders.
 -n Output line numbers from input file instead of IDs in result file.
 -o Switch off output.


 Environment variables:
 ======================
 Define the FMINER_SMARTS environment variable to produce output in SMARTS format (e.g. export FMINER_SMARTS=1).
 Additionally define the FMINER_LAZAR environment variable to produce output in linfrag format which can be used as input to Lazar (e.g. export FMINER_LAZAR=1).


 Remarks:
 ========
 Structures are output to STDOUT in the specified format.
 Output consists of blocks of gSpan graphs.
 When using SMARTS output (environment variable FMINER_SMARTS is defined), each line is a YAML sequence, containing SMARTS fragment, p-value, and two sequences denoting positive and negative class occurrences (default compound ids, see also switch -n). 
 When also FMINER_LAZAR is set, the linfrag output format is used.


 Examples:
 =========

 In any case, 1-frequent patterns are not refined further, unless you use the -s switch 

 Use BBRC mining (default):

 # BBRC representatives (min frequency 2, min significance 95%), using dynamic UB pruning
 ./fminer   
 # same as above, but much slower (explicitly no dynamic UB pruning)
 ./fminer -d                                          

 Switch off BBRC mining:

 # all 2-frequent and 95%-significant features
 # Note, that the -d is mandatory (no dynamic UB pruning possible here)!
 ./fminer -d -b  
 # All 20-frequent patterns (standard frequent pattern mining)
 ./fminer -f 20 

 
Andreas Maunz, 2008.

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。