FairSchedulableBuilder — SchedulableBuilder for FAIR Scheduling Mode

FairSchedulableBuilder is a SchedulableBuilder that is created exclusively for TaskSchedulerImpl for FAIR scheduling mode (when spark.scheduler.mode configuration property is FAIR).

FairSchedulableBuilder takes the following to be created:

Once created, TaskSchedulerImpl requests the FairSchedulableBuilder to build the pools.

FairSchedulableBuilder uses the pools defined in an allocation pools configuration file that is assumed to be the value of the spark.scheduler.allocation.file configuration property or the default fairscheduler.xml (that is expected to be available on a Spark application’s class path).

Tip
Use conf/fairscheduler.xml.template as a template for the allocation pools configuration file.

FairSchedulableBuilder always has the default pool defined (and registers it unless done in the allocation pools configuration file).

FairSchedulableBuilder uses spark.scheduler.pool local property for the name of the pool to use when requested to addTaskSetManager (default: default).

Note
Use SparkContext.setLocalProperty to set properties per thread (aka local properties) to group jobs in logical groups, e.g. to allow FairSchedulableBuilder to use spark.scheduler.pool property and to group jobs from different threads to be submitted for execution on a non-default pool.
scala> :type sc
org.apache.spark.SparkContext

sc.setLocalProperty("spark.scheduler.pool", "production")

// whatever is executed afterwards is submitted to production pool
Tip

Enable ALL logging level for org.apache.spark.scheduler.FairSchedulableBuilder logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.scheduler.FairSchedulableBuilder=ALL

Refer to Logging.

Allocation Pools Configuration File

The allocation pools configuration file is an XML file.

The default conf/fairscheduler.xml.template is as follows:

<?xml version="1.0"?>
<allocations>
  <pool name="production">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>
</allocations>
Tip
The top-level element’s name allocations can be anything. Spark does not insist on allocations and accepts any name.

Building (Tree of) Pools of Schedulables — buildPools Method

buildPools(): Unit
Note
buildPools is part of the SchedulableBuilder Contract to build a tree of pools (of Schedulables).

buildPools prints out the following INFO message to the logs when the configuration file (per the spark.scheduler.allocation.file configuration property) could be read:

Creating Fair Scheduler pools from [file]

buildPools prints out the following INFO message to the logs when the spark.scheduler.allocation.file configuration property was not used to define the configuration file and the default configuration file is used instead:

Creating Fair Scheduler pools from default file: [DEFAULT_SCHEDULER_FILE]

When neither spark.scheduler.allocation.file configuration property nor the default configuration file could be used, buildPools prints out the following WARN message to the logs:

Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in [DEFAULT_SCHEDULER_FILE] or set spark.scheduler.allocation.file to a file that contains the configuration.

addTaskSetManager Method

addTaskSetManager(manager: Schedulable, properties: Properties): Unit
Note
addTaskSetManager is part of the SchedulableBuilder Contract to register a new Schedulable with the rootPool

addTaskSetManager finds the pool by name (in the given Properties) under the spark.scheduler.pool property or defaults to the default pool if undefined.

addTaskSetManager then requests the root pool to find the Schedulable by that name.

Unless found, addTaskSetManager creates a new Pool with the default configuration (as if the default pool were used) and requests the Pool to register it. In the end, addTaskSetManager prints out the following WARN message to the logs:

A job was submitted with scheduler pool [poolName], which has not been configured. This can happen when the file that pools are read from isn't set, or when that file doesn't contain [poolName]. Created [poolName] with default configuration (schedulingMode: [mode], minShare: [minShare], weight: [weight])

addTaskSetManager then requests the pool (found or newly-created) to register the given Schedulable.

In the end, addTaskSetManager prints out the following INFO message to the logs:

Added task set [name] tasks to pool [poolName]

Registering Default Pool — buildDefaultPool Method

buildDefaultPool(): Unit

buildDefaultPool requests the root pool to find the default pool (one with the default name).

Unless already available, buildDefaultPool creates a schedulable pool with the following:

  • default pool name

  • FIFO scheduling mode

  • 0 for the initial minimum share

  • 1 for the initial weight

In the end, buildDefaultPool requests the Pool to register the pool followed by the INFO message in the logs:

Created default pool: [name], schedulingMode: [mode], minShare: [minShare], weight: [weight]
Note
buildDefaultPool is used exclusively when FairSchedulableBuilder is requested to build the pools.

Building Pools from XML Allocations File — buildFairSchedulerPool Internal Method

buildFairSchedulerPool(
  is: InputStream,
  fileName: String): Unit

buildFairSchedulerPool starts by loading the XML file from the given InputStream.

For every pool element, buildFairSchedulerPool creates a schedulable pool with the following:

  • Pool name per name attribute

  • Scheduling mode per schedulingMode element (case-insensitive with FIFO as the default)

  • Initial minimum share per minShare element (default: 0)

  • Initial weight per weight element (default: 1)

In the end, buildFairSchedulerPool requests the Pool to register the pool followed by the INFO message in the logs:

Created pool: [name], schedulingMode: [mode], minShare: [minShare], weight: [weight]
Note
buildFairSchedulerPool is used exclusively when FairSchedulableBuilder is requested to build the pools.

results matching ""

    No results matching ""