Advanced Analytics with Spark: Patterns for Learning from Data at Scale
<div><p>In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example.</p><p>You€ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques€"classification, collaborative filtering, and anomaly detection among others€"to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you€ll find these patterns useful for working on your own data applications.</p><p>Patterns include:</p><ul><li>Recommending music and the Audioscrobbler data set</li><li>Predicting forest cover with decision trees</li><li>Anomaly detection in network traffic with K-means clustering</li><li>Understanding Wikipedia with Latent Semantic Analysis</li><li>Analyzing co-occurrence networks with GraphX</li><li>Geospatial and temporal data analysis on the New York City Taxi Trips data</li><li>Estimating financial risk through Monte Carlo simulation</li><li>Analyzing genomics data and the BDG project</li><li>Analyzing neuroimaging data with PySpark and Thunder</li></ul></div>