SchedulerBackend — Pluggable Scheduler Backends

SchedulerBackend is a pluggable interface to support various cluster managers, e.g. Apache Mesos, Hadoop YARN or Spark’s own Spark Standalone and Spark local.

As the cluster managers differ by their custom task scheduling modes and resource offers mechanisms Spark abstracts the differences in SchedulerBackend contract.

Table 1. Built-In (Direct and Indirect) SchedulerBackends per Cluster Environment
Cluster Environment SchedulerBackends

Local mode


(base for custom SchedulerBackends)


Spark Standalone


Spark on YARN

Spark on Mesos

A scheduler backend is created and started as part of SparkContext’s initialization (when TaskSchedulerImpl is started - see Creating Scheduler Backend and Task Scheduler).

FIXME Image how it gets created with SparkContext in play here or in SparkContext doc.

Scheduler backends are started and stopped as part of TaskSchedulerImpl’s initialization and stopping.

Being a scheduler backend in Spark assumes a Apache Mesos-like model in which "an application" gets resource offers as machines become available and can launch tasks on them. Once a scheduler backend obtains the resource allocation, it can start executors.

Understanding how Apache Mesos works can greatly improve understanding Spark.

SchedulerBackend Contract

trait SchedulerBackend {
  def applicationId(): String
  def applicationAttemptId(): Option[String]
  def defaultParallelism(): Int
  def getDriverLogUrls: Option[Map[String, String]]
  def isReady(): Boolean
  def killTask(taskId: Long, executorId: String, interruptThread: Boolean): Unit
  def reviveOffers(): Unit
  def start(): Unit
  def stop(): Unit
org.apache.spark.scheduler.SchedulerBackend is a private[spark] Scala trait in Spark.
Table 2. SchedulerBackend Contract
Method Description


Unique identifier of Spark Application

Used when TaskSchedulerImpl is asked for the unique identifier of a Spark application (that is actually a part of TaskScheduler contract).


Attempt id of a Spark application

Only supported by YARN cluster scheduler backend as the YARN cluster manager supports multiple application attempts.

Used when…​

NOTE: applicationAttemptId is also a part of TaskScheduler contract and TaskSchedulerImpl directly calls the SchedulerBackend’s applicationAttemptId.


Used when TaskSchedulerImpl finds the default level of parallelism (as a hint for sizing jobs).


Returns no URLs by default and only supported by YarnClusterSchedulerBackend


Controls whether SchedulerBackend is ready (i.e. true) or not (i.e. false). Enabled by default.

Used when TaskSchedulerImpl waits until SchedulerBackend is ready (which happens just before SparkContext is fully initialized).


Reports a UnsupportedOperationException by default.

Used when:



Starts SchedulerBackend.

Used when TaskSchedulerImpl is started.


Stops SchedulerBackend.

Used when TaskSchedulerImpl is stopped.

results matching ""

    No results matching ""