ClusteringAlgorithms
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Implementation of various Clustering Algorithms in Data Mining Course [CSE 601]
Part A) For All Algorithms except KMeans Map Reduce:

The folder contains:
	1.	projetc2.jar - runnable .jar file for all the 3 algorithms and cluster ensembling
	2.	cse601 - contains source code for project 2 other than hadoop/map reduce code which can be directly imported into eclipse for testing.
	We have used two external libraries which can be found in cse601/lib.
	Please import these folders in Eclipse to correctly execute all the three algorithms.

To run the code, execute the following command:

java -jar project2.jar file_name
Eg: java -jar project2.jar cho.txt

The output contains:
	1.	External index value (Jaccard coefficient) for the following algorithms:
	⁃	KMeans
	⁃	Hierarchical clustering using MIN/Single link as the distance criteria
	⁃	DBScan
	2.	Internal index (cor(X, Y) = Σ[(xi - E(X))(yi - E(Y))] / [s(X)s(Y)] where E(X) is the mean of X, E(Y) is the mean of the Y values and s(X), s(Y) are standard deviations) for all the above algorithms. Note that correlation is not divided by size of the data;
	3.	External and internal index values after Clustering ensemble algorithm(relabeling and voting)
The algorithms are described in the report.

Part B)
1. All the required files are under KMeans-MapReduce folder in the submission
2. Source files are under KMeansMR/src
3. KMeansMR.jar is present under KMeans-MapReduce folder.
4. To execute KMeansMR.jar just use the script run.sh as below:
	sh run.sh
5. Output will be generated under output folder
6. Most recent output will be under the highest iteration number for particular job
    e.g. for cho.txt , it will be under : output/kmeans-cho-output/11 ; for 12 iterations
   This file contains the output file which has all the cluster ids and the genes present under them.

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。