car-datascience-toolkit
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:Simple implementations of data science tools for use by newspaper reporters.
h1. CAR Data Science Toolkit

The CAR datas science toolkit is a collection of common data science tools and algorithms, implemented and documented as simply as possible for data journalists to learn from and understand.

h3. Tools currently implemented include:

* Clustering algorithms: DBSCAN; k-means clustering
* Classification: Naive Bayes classifier; k-nearest neighbors
* Similarity metrics: Euclidean distance; Jaccard similarity; cosine similarity; Pearson similarity; Hamming distance
* MapReduce workflow that calculates pairwise document similarity based on TF-IDF weights.

本源码包内暂不包含可直接显示的源代码文件,请下载源码包。