资源说明:Spark for Python Developers aims to combine the elegance and flexibility of Python with
the power and versatility of Apache Spark. Spark is written in Scala and runs on the Java
virtual machine. It is nevertheless polyglot and offers bindings and APIs for Java, Scala,
Python, and R. Python is a well-designed language with an extensive set of specialized
libraries. This book looks at PySpark within the PyData ecosystem. Some of the
prominent PyData libraries include Pandas, Blaze, Scikit-Learn, Matplotlib, Seaborn, and
Bokeh. These libraries are open source. They are developed, used, and maintained by the
data scientist and Python developers community. PySpark integrates well with the PyData
ecosystem, as endorsed by the Anaconda Python distribution. The book puts forward a
journey to build data-intensive apps along with an architectural blueprint that covers the
following steps: first, set up the base infrastructure with Spark. Second, acquire, collect,
process, and store the data. Third, gain insights from the collected data. Fourth, stream live
data and process it in real time. Finally, visualize the information.
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。