README
上传用户:quxuerui
上传日期:2018-01-08
资源大小:41811k
文件大小:4k
- ****************** FailMon Quick Start Guide ***********************
- This document is a guide to quickly setting up and running FailMon.
- For more information and details please see the FailMon User Manual.
- ***** Building FailMon *****
- Normally, FailMon lies under <hadoop-dir>/src/contrib/failmon, where
- <hadoop-source-dir> is the Hadoop project root folder. To compile it,
- one can either run ant for the whole Hadoop project, i.e.:
- $ cd <hadoop-dir>
- $ ant
- or run ant only for FailMon:
- $ cd <hadoop-dir>/src/contrib/failmon
- $ ant
- The above will compile FailMon and place all class files under
- <hadoop-dir>/build/contrib/failmon/classes.
- By invoking:
- $ cd <hadoop-dir>/src/contrib/failmon
- $ ant tar
- FailMon is packaged as a standalone jar application in
- <hadoop-dir>/src/contrib/failmon/failmon.tar.gz.
- ***** Deploying FailMon *****
- There are two ways FailMon can be deployed in a cluster:
- a) Within Hadoop, in which case the whole Hadoop package is uploaded
- to the cluster nodes. In that case, nothing else needs to be done on
- individual nodes.
- b) Independently of the Hadoop deployment, i.e., by uploading
- failmon.tar.gz to all nodes and uncompressing it. In that case, the
- bin/failmon.sh script needs to be edited; environment variable
- HADOOPDIR should point to the root directory of the Hadoop
- distribution. Also the location of the Hadoop configuration files
- should be pointed by the property 'hadoop.conf.path' in file
- conf/failmon.properties. Note that these files refer to the HDFS in
- which we want to store the FailMon data (which can potentially be
- different than the one on the cluster we are monitoring).
- We assume that either way FailMon is placed in the same directory on
- all nodes, which is typical for most clusters. If this is not
- feasible, one should create the same symbolic link on all nodes of the
- cluster, that points to the FailMon directory of each node.
- One should also edit the conf/failmon.properties file on each node to
- set his own property values. However, the default values are expected
- to serve most practical cases. Refer to the FailMon User Manual about
- the various properties and configuration parameters.
- ***** Running FailMon *****
- In order to run FailMon using a node to do the ad-hoc scheduling of
- monitoring jobs, one needs edit the hosts.list file to specify the
- list of machine hostnames on which FailMon is to be run. Also, in file
- conf/global.config the username used to connect to the machines has to
- be specified (passwordless SSH is assumed) in property 'ssh.username'.
- In property 'failmon.dir', the path to the FailMon folder has to be
- specified as well (it is assumed to be the same on all machines in the
- cluster). Then one only needs to invoke the command:
- $ cd <hadoop-dir>
- $ bin/scheduler.py
- to start the system.
- ***** Merging HDFS files *****
- For the purpose of merging the files created on HDFS by FailMon, the
- following command can be used:
- $ cd <hadoop-dir>
- $ bin/failmon.sh --mergeFiles
- This will concatenate all files in the HDFS folder (pointed to by the
- 'hdfs.upload.dir' property in conf/failmon.properties file) into a
- single file, which will be placed in the same folder. Also the
- location of the Hadoop configuration files should be pointed by the
- property 'hadoop.conf.path' in file conf/failmon.properties. Note that
- these files refer to the HDFS in which have stored the FailMon data
- (which can potentially be different than the one on the cluster we are
- monitoring). Also, the scheduler.py script can be set up to merge the
- HDFS files when their number surpasses a configurable limit (see
- 'conf/global.config' file).
- Please refer to the FailMon User Manual for more details.