Spark: The Definitive Guide: Big Data Processing Made Simple
<div><p>Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.</p><p>You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library.</p><ul><li>Get a gentle overview of big data and Spark</li><li>Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples</li><li>Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames</li><li>Understand how Spark runs on a cluster</li><li>Debug, monitor, and tune Spark clusters and applications</li><li>Learn the power of Structured Streaming, Spark’s stream-processing engine</li><li>Learn how you can apply MLlib to a variety of problems, including classification or recommendation</li></ul></div>