Before you begin your journey as an Apache Spark programmer, you should have a solid understanding of the Spark application architecture and how applications are executed on a Spark cluster. This article closely examines the components of a Spark application, looks at how these components work together, and looks at how Spark applications run on standalone and YARN clusters.
Anatomy of a Spark application
A Spark application contains several components, all of which exist whether you’re running Spark on a single machine or across a cluster of hundreds or thousands of nodes.
Each component has a specific role in executing a Spark program. Some of these roles, such as the client components, are passive during execution; other roles are active in the execution of the program, including components executing computation functions.
The components of a Spark application are: