资源说明:Contents
Foreword........................................................................................................................................... 8
Acknowledgments............................................................................................................................. 10
About the Author............................................................................................................................... 13
1. Introduction: Why Look Beyond Hadoop Map-Reduce?.........................................14
Hadoop Suitability...................................................................................................................... 15
Big Data Analytics: Evolution of Machine Learning Realizations....................................... 19
Closing Remarks......................................................................................................................... 24
References....................................................................................................................................25
2. What Is the Berkeley Data Analytics Stack (BDAS)?................................................28
Motivation for BDAS..................................................................................................................28
BDAS Design and Architecture.................................................................................................32
Spark: Paradigm for Efficient Data Processing on a Cluster................................................34
Shark: SQL Interface over a Distributed System................................................................... 42
Mesos: Cluster Scheduling and Management System...........................................................45
Closing Remarks......................................................................................................................... 50
References................................................................................................................................... 50
3. Realizing Machine Learning Algorithms with Spark............................................... 55
Basics of Machine Learning...................................................................................................... 55
Logistic Regression: An Overview............................................................................................59
Logistic Regression Algorithm in Spark.................................................................................. 61
Support Vector Machine (SVM)............................................................................................... 64
PMML Support in Spark........................................................................................................... 68
Machine Learning on Spark with MLbase.............................................................................. 78
References....................................................................................................................................79
4. Realizing Machine Learning Algorithms in Real Time............................................81
Introduction to Storm................................................................................................................ 81
Design Patterns in Storm.......................................................................................................... 88
Implementing Logistic Regression Algorithm in Storm....................................................... 91
Implementing Support Vector Machine Algorithm in Storm.............................................. 94
7
Naive Bayes PMML Support in Storm.....................................................................................97
Real-Time Analytic Applications............................................................................................100
Spark Streaming....................................................................................................................... 106
References..................................................................................................................................107
5. Graph Processing Paradigms...........................................................................................109
Pregel: Graph-Processing Framework Based on BSP......................................................... 109
Open Source Pregel Implementations....................................................................................112
GraphLab....................................................................................................................................114
References..................................................................................................................................128
6. Conclusions: Big Data Analytics Beyond Hadoop Map-Reduce......................... 131
Overview of Hadoop YARN......................................................................................................131
Other Frameworks over YARN............................................................................................... 133
What Does the Future Hold for Big Data Analytics?........................................................... 134
References..................................................................................................................................136
A. Code Sketches........................................................................................................................ 138
Code for Naive Bayes PMML Scoring in Spark.................................................................... 138
Code for Linear Regression PMML Support in Spark.........................................................149
Page Rank in GraphLab........................................................................................................... 153
SGD in GraphLab......................................................................................................................158
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。