Computing a job is equivalent to computing the partitions of the RDD the action has been executed upon. The number of partitions in a job depends on the type of a stage - ResultStage or ShuffleMapStage.
A job starts with a single target RDD, but can ultimately include other RDDs that are all part of the target RDD’s lineage graph.
The parent stages are the instances of ShuffleMapStage.
Note that not all partitions have always to be computed for ResultStages for actions like
Internally, a job is represented by an instance of private[spark] class org.apache.spark.scheduler.ActiveJob.
A job can be one of two logical types (that are only distinguished by an internal
finalStage field of
Map-stage job that computes the map output files for a ShuffleMapStage (for
submitMapStage) before any downstream stages are submitted.
It is also used for Adaptive Query Planning / Adaptive Scheduling, to look at map output statistics before submitting later stages.
Result job that computes a ResultStage to execute an action.
Jobs track how many partitions have already been computed (using
finished array of