资源说明:Simple implementations of data science tools for use by newspaper reporters.
h1. CAR Data Science Toolkit The CAR datas science toolkit is a collection of common data science tools and algorithms, implemented and documented as simply as possible for data journalists to learn from and understand. h3. Tools currently implemented include: * Clustering algorithms: DBSCAN; k-means clustering * Classification: Naive Bayes classifier; k-nearest neighbors * Similarity metrics: Euclidean distance; Jaccard similarity; cosine similarity; Pearson similarity; Hamming distance * MapReduce workflow that calculates pairwise document similarity based on TF-IDF weights.
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。