MetricsSystem — Registry of Metrics Sources and Sinks of Spark Subsystem

MetricsSystem is a registry of metrics sources and sinks of a Spark subsystem, e.g. the driver of a Spark application.

spark metrics MetricsSystem driver.png
Figure 1. Creating MetricsSystem for Driver

MetricsSystem may have at most one MetricsServlet JSON metrics sink (which is registered by default).

When created, MetricsSystem requests MetricsConfig to initialize.

spark metrics MetricsSystem.png
Figure 2. Creating MetricsSystem
Table 1. Metrics Instances (Subsystems) and MetricsSystems
Name When Created

applications

Spark Standalone’s Master is created.

driver

SparkEnv is created for the driver.

executor

SparkEnv is created for an executor.

master

Spark Standalone’s Master is created.

mesos_cluster

Spark on Mesos' MesosClusterScheduler is created.

shuffleService

ExternalShuffleService is created.

worker

Spark Standalone’s Worker is created.

MetricsSystem uses MetricRegistry as the integration point to Dropwizard Metrics library.

Table 2. MetricsSystem’s Internal Registries and Counters
Name Description

metricsConfig

MetricsConfig

Initialized when MetricsSystem is created.

Used when MetricsSystem registers sinks and sources.

metricsServlet

MetricsServlet JSON metrics sink that is only available for the metrics instances with a web UI, i.e. the driver of a Spark application and Spark Standalone’s Master.

Initialized when MetricsSystem registers sinks (and finds a configuration entry with servlet sink name).

Used exclusively when MetricsSystem is requested for a JSON servlet handler.

registry

Dropwizard Metrics' MetricRegistry

Used when MetricsSystem is requested to:

running

Flag that indicates whether MetricsSystem has been started (true) or not (false)

Default: false

sinks

Metrics sinks in a Spark application.

Used when MetricsSystem registers a new metrics sink and starts them eventually.

sources

Metrics sources in a Spark application.

Used when MetricsSystem registers a new metrics source.

Tip

Enable WARN or ERROR logging levels for org.apache.spark.metrics.MetricsSystem logger to see what happens in MetricsSystem.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.metrics.MetricsSystem=WARN

Refer to Logging.

"Static" Metrics Sources for Spark SQL — StaticSources

Caution
FIXME

Registering Metrics Source — registerSource Method

registerSource(source: Source): Unit

registerSource adds source to sources internal registry.

registerSource creates an identifier for the metrics source and registers it with MetricRegistry.

Note
registerSource uses Metrics' MetricRegistry.register to register a metrics source under a given name.

When registerSource tries to register a name more than once, you should see the following INFO message in the logs:

INFO Metrics already registered
Note

registerSource is used when:

  • SparkContext registers metrics sources for:

  • MetricsSystem is started (and registers the "static" metrics sources — CodegenMetrics and HiveCatalogMetrics) and does registerSources.

  • Executor is created (and registers a ExecutorSource)

  • ExternalShuffleService is started (and registers ExternalShuffleServiceSource)

  • Spark Structured Streaming’s StreamExecution runs batches as data arrives (when metrics are enabled).

  • Spark Streaming’s StreamingContext is started (and registers StreamingSource)

  • Spark Standalone’s Master and Worker start (and register their MasterSource and WorkerSource, respectively)

  • Spark Standalone’s Master registers a Spark application (and registers a ApplicationSource)

  • Spark on Mesos' MesosClusterScheduler is started (and registers a MesosClusterSchedulerSource)

Building Metrics Source Identifier — buildRegistryName Method

buildRegistryName(source: Source): String
Note
buildRegistryName is used to build the metrics source identifiers for a Spark application’s driver and executors, but also for other Spark framework’s components (e.g. Spark Standalone’s master and workers).
Note
buildRegistryName uses spark.metrics.namespace and spark.executor.id Spark properties to differentiate between a Spark application’s driver and executors, and the other Spark framework’s components.

(only when instance is driver or executor) buildRegistryName builds metrics source name that is made up of spark.metrics.namespace, spark.executor.id and the name of the source.

Note
buildRegistryName uses Dropwizard Metrics' MetricRegistry to build metrics source identifiers.
Caution
FIXME Finish for the other components.
Note
buildRegistryName is used when MetricsSystem registers or removes a metrics source.

Registering Metrics Sources for Spark Instance — registerSources Internal Method

registerSources(): Unit

registerSources finds metricsConfig configuration for the metrics instance.

Note
instance is defined when MetricsSystem is created.

registerSources finds the configuration of all the metrics sources for the subsystem (as described with source. prefix).

For every metrics source, registerSources finds class property, creates an instance, and in the end registers it.

When registerSources fails, you should see the following ERROR message in the logs followed by the exception.

ERROR Source class [classPath] cannot be instantiated
Note
registerSources is used exclusively when MetricsSystem is started.

Requesting JSON Servlet Handler — getServletHandlers Method

getServletHandlers: Array[ServletContextHandler]

If the MetricsSystem is running and the MetricsServlet is defined for the metrics system, getServletHandlers simply requests the MetricsServlet for the JSON servlet handler.

When MetricsSystem is not running getServletHandlers throws an IllegalArgumentException.

Can only call getServletHandlers on a running MetricsSystem
Note

getServletHandlers is used when:

  • SparkContext is created

  • Spark Standalone’s Master and Worker are requested to start (as onStart)

Registering Metrics Sinks — registerSinks Internal Method

registerSinks(): Unit

registerSinks requests the MetricsConfig for the configuration of the instance.

registerSinks requests the MetricsConfig for the configuration of all metrics sinks (i.e. configuration entries that match ^sink\\.(.)\\.(.) regular expression).

For every metrics sink configuration, registerSinks takes class property and (if defined) creates an instance of the metric sink using an constructor that takes the configuration, MetricRegistry and SecurityManager.

For a single servlet metrics sink, registerSinks converts the sink to a MetricsServlet and sets the metricsServlet internal registry.

For all other metrics sinks, registerSinks adds the sink to the sinks internal registry.

In case of an Exception, registerSinks prints out the following ERROR message to the logs:

Sink class [classPath] cannot be instantiated
Note
registerSinks is used exclusively when MetricsSystem is requested to start.

stop Method

stop(): Unit

stop…​FIXME

Note
stop is used when…​FIXME

getSourcesByName Method

getSourcesByName(sourceName: String): Seq[Source]

getSourcesByName…​FIXME

Note
getSourcesByName is used when…​FIXME

removeSource Method

removeSource(source: Source): Unit

removeSource…​FIXME

Note
removeSource is used when…​FIXME

Creating MetricsSystem Instance

MetricsSystem takes the following when created:

MetricsSystem initializes the internal registries and counters.

When created, MetricsSystem requests MetricsConfig to initialize.

Note
createMetricsSystem is used to create a new MetricsSystems instance instead.

Creating MetricsSystem Instance For Subsystem — createMetricsSystem Factory Method

createMetricsSystem(
  instance: String
  conf: SparkConf
  securityMgr: SecurityManager): MetricsSystem

createMetricsSystem returns a new MetricsSystem.

Note
createMetricsSystem is used when a metrics instance is created.

Requesting Sinks to Report Metrics — report Method

report(): Unit

report simply requests the registered metrics sinks to report metrics.

Note
report is used when SparkContext, Executor, Spark Standalone’s Master and Worker, Spark on Mesos' MesosClusterScheduler are requested to stop

Starting MetricsSystem — start Method

start(): Unit

start turns running flag on.

Note
start can only be called once and throws an IllegalArgumentException when called multiple times.

start registers the "static" metrics sources for Spark SQL, i.e. CodegenMetrics and HiveCatalogMetrics.

start then registers the configured metrics sources and sinks for the Spark instance.

In the end, start requests the registered metrics sinks to start.

start throws an IllegalArgumentException when running flag is on.

requirement failed: Attempting to start a MetricsSystem that is already running
Note

start is used when:

  • SparkContext is created

  • SparkEnv is created (on executors)

  • ExternalShuffleService is requested to start

  • Spark Standalone’s Master and Worker, and Spark on Mesos' MesosClusterScheduler are requested to start

results matching ""

    No results matching ""