SchedulerBackend — Pluggable Task Scheduling Systems

SchedulerBackend is the abstraction of task scheduling systems in Apache Spark that can revive resource offers from various cluster managers (e.g. built-in Spark Standalone, Hadoop YARN, Kubernetes, Apache Mesos, and Spark local).

SchedulerBackend simply abstracts away the differences between cluster managers with regards to resource offers and task scheduling modes.

Note
Being a scheduler backend system assumes a Apache Mesos-like scheduling model in which "an application" gets resource offers as machines become available so it is possible to launch tasks on them. Once required resource allocation is obtained, the scheduler backend can start executors.
Table 1. SchedulerBackend Contract
Method Description

applicationAttemptId

applicationAttemptId(): Option[String]

Execution attempt ID of the Spark application

Default: None (undefined)

Used exclusively when TaskSchedulerImpl is requested for the execution attempt ID of a Spark application

applicationId

applicationId(): String

Unique identifier of the Spark Application

Default: spark-application-[currentTimeMillis]

Used exclusively when TaskSchedulerImpl is requested for the unique identifier of a Spark application

defaultParallelism

defaultParallelism(): Int

Default parallelism, i.e. a hint for the number of tasks in stages while sizing jobs

Used exclusively when TaskSchedulerImpl is requested for the default parallelism

getDriverLogUrls

getDriverLogUrls: Option[Map[String, String]]

Driver log URLs

Default: None (undefined)

Used exclusively when SparkContext is requested to postApplicationStart

isReady

isReady(): Boolean

Controls whether the SchedulerBackend is ready (true) or not (false)

Default: true

Used exclusively when TaskSchedulerImpl is requested to wait until scheduling backend is ready

killTask

killTask(
  taskId: Long,
  executorId: String,
  interruptThread: Boolean,
  reason: String): Unit

Kills a given task

Default: Throws an UnsupportedOperationException

Used when:

maxNumConcurrentTasks

maxNumConcurrentTasks(): Int

Maximum number of concurrent tasks that can be launched now

Used exclusively when SparkContext is requested to maxNumConcurrentTasks

reviveOffers

reviveOffers(): Unit

Handles resource allocation offers (from the scheduling system)

Used when TaskSchedulerImpl is requested to:

start

start(): Unit

Starts the SchedulerBackend

Used exclusively when TaskSchedulerImpl is requested to start

stop

stop(): Unit

Stops the SchedulerBackend

Used when:

Table 2. SchedulerBackends (Direct Implementations and Extensions)
SchedulerBackend Description

CoarseGrainedSchedulerBackend

Base SchedulerBackend for coarse-grained scheduling systems

LocalSchedulerBackend

Spark local

MesosFineGrainedSchedulerBackend

Fine-grained scheduling system for Apache Mesos

Note
org.apache.spark.scheduler.SchedulerBackend is a private[spark] Scala trait in Spark.

results matching ""

    No results matching ""