Before We Begin 7
Objectives and Expectations 7
Assumptions 7
Formatting 8
Beyond this Book 8
What, Why, How 9
What is Apache Spark? 9
Why Spark? 9
Fundamentals of Apache Spark 9
How to Be Productive with Spark? 10
Apache Spark Ecosystem Components 10
Conclusion What about Hadoop? 10
Spark RDDs A Two Minute Guide For Beginners 11
What is a Spark RDD? 11
How are Spark RDDs created? 11
Why Spark RDDs? 11
When to use Spark RDDs? 12
Apache Spark The Building Blocks 13
Overview 13
Requirements 13
Spark with Scala First Tutorial 13
Spark Context and Resilient Distributed Datasets 15
Actions and Transformations 16
Looking Ahead 17
Apache Spark: Examples Of Transformations 18
Transformations Part 1 18
Transformations Part 2 22
Transformations Part 3 23
Apache Spark: Examples Of Actions 26
Conclusion 30
Spark Clusters 31
Apache Spark Cluster Part 1: Run Standalone 31
Running a Spark Standalone Cluster 31
Spark Cluster Part 2: Deploy Scala Program To Spark Cluster 35
Requirements 35
Steps to Deploy Scala Program to Spark Cluster 35
Conclusion 37
Further Reference 37
Spark SQL with Scala 38
SQL 38
DataFrames 38
Datasets 38
Looking ahead 38
Spark SQL CSV Examples 39
Overview 39
Methodology 39
Spark SQL CSV Example Tutorial Part 1 39
Spark SQL CSV Example Tutorial Part 2 41
Spark SQL JSON Examples 43
Overview 43
Methodology 43
Spark SQL JSON Example Tutorial Part 1 43
Spark SQL JSON Example Tutorial Part 2 44
Spark SQL MySQL Example With JDBC 47
Overview 47
Requirements 47
Quick Setup 47
Methodology 48
Spark SQL with MySQL (JDBC) Example Tutorial 48
Conclusion Spark SQL with MySQL (JDBC) 49
Spark Streaming with Scala 50
DStreams 50
Architecture and Abstraction 50
Transformations 50
Input Sources 51
Checkpointing 51
Streaming Processing Guarantees 51
Streaming UI 51
Performance Considerations 51
Spark Streaming With Scala 52
Overview 52
Steps 52
Making and Running Our Own NetworkWordCount 52
Steps 52
Spark Streaming With Scala Part 1 Conclusion 53
Spark Streaming – Let’s Stream From Slack 54
Spark Streaming Example Overview 54
Resources 61
Spark Streaming Automated Testing With Scala 63
Pre-requisites 63
Overview 63
Steps 63
Conclusion 69
Additional Resources 69
Spark Machine Learning 70
Overview 70
Apache Spark Machine Learning Example With Scala 70
Apache Spark Machine Learning Example 71
Apache Spark Machine Learning Scala Source Code Review 71
Resources 75
Special Recipes 76
Spark With Amazon S3 77
Apache Spark with Amazon S3 Examples 77
Example Load Text File from S3 Written from Hadoop Library 78
S3 from Spark Text File Interoperability 79
References 79
Apache Spark, Cassandra And Game Of Thrones 80
Overview 80
Requirements 80
Steps 80
Conclusion 86
Spark Cassandra Tutorial Resources 86