CacheManager — In-Memory Cache for Tables and Views

CacheManager is an in-memory cache (registry) for structured queries (by their logical plans).

CacheManager is shared across SparkSessions through SharedState.

val spark: SparkSession = ...
spark.sharedState.cacheManager
Note
A Spark developer can use CacheManager to cache Datasets using cache or persist operators.

CacheManager uses the cachedData internal registry to manage cached structured queries and their InMemoryRelation cached representation.

CacheManager can be empty.

CacheManager uses CachedData data structure for managing cached structured queries with the LogicalPlan (of a structured query) and a corresponding InMemoryRelation leaf logical operator.

Tip

Enable ALL logging level for org.apache.spark.sql.execution.CacheManager logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.CacheManager=ALL

Refer to Logging.

Cached Structured Queries — cachedData Internal Registry

cachedData: LinkedList[CachedData]

cachedData is a collection of CachedData.

A new CachedData added when CacheManager is requested to:

A CachedData removed when CacheManager is requested to:

All CachedData removed (cleared) when CacheManager is requested to clearCache

lookupCachedData Method

lookupCachedData(query: Dataset[_]): Option[CachedData]
lookupCachedData(plan: LogicalPlan): Option[CachedData]

lookupCachedData…​FIXME

Note

lookupCachedData is used when:

Un-caching Dataset — uncacheQuery Method

uncacheQuery(
  query: Dataset[_],
  cascade: Boolean,
  blocking: Boolean = true): Unit
uncacheQuery(
  spark: SparkSession,
  plan: LogicalPlan,
  cascade: Boolean,
  blocking: Boolean): Unit

uncacheQuery…​FIXME

Note

uncacheQuery is used when:

isEmpty Method

isEmpty: Boolean

isEmpty simply says whether there are any CachedData entries in the cachedData internal registry.

Caching Dataset — cacheQuery Method

cacheQuery(
  query: Dataset[_],
  tableName: Option[String] = None,
  storageLevel: StorageLevel = MEMORY_AND_DISK): Unit

cacheQuery adds the analyzed logical plan of the input Dataset to the cachedData internal registry of cached queries.

Internally, cacheQuery requests the Dataset for the analyzed logical plan and creates a InMemoryRelation with the following properties:

cacheQuery then creates a CachedData (for the analyzed query plan and the InMemoryRelation) and adds it to the cachedData internal registry.

If the input query has already been cached, cacheQuery simply prints the following WARN message to the logs and exits (i.e. does nothing but prints out the WARN message):

Asked to cache already cached data.
Note

cacheQuery is used when:

Removing All Cached Logical Plans — clearCache Method

clearCache(): Unit

clearCache takes every CachedData from the cachedData internal registry and requests it for the InMemoryRelation to access the CachedRDDBuilder. clearCache requests the CachedRDDBuilder to clearCache.

In the end, clearCache removes all CachedData entries from the cachedData internal registry.

Note
clearCache is used exclusively when CatalogImpl is requested to clear the cache.

Re-Caching Structured Query — recacheByCondition Internal Method

recacheByCondition(spark: SparkSession, condition: LogicalPlan => Boolean): Unit

recacheByCondition…​FIXME

Note
recacheByCondition is used when CacheManager is requested to uncache a structured query, recacheByPlan, and recacheByPath.

recacheByPlan Method

recacheByPlan(spark: SparkSession, plan: LogicalPlan): Unit

recacheByPlan…​FIXME

Note
recacheByPlan is used exclusively when InsertIntoDataSourceCommand logical command is executed.

recacheByPath Method

recacheByPath(spark: SparkSession, resourcePath: String): Unit

recacheByPath…​FIXME

Note
recacheByPath is used exclusively when CatalogImpl is requested to refreshByPath.

Replacing Segments of Logical Query Plan With Cached Data — useCachedData Method

useCachedData(plan: LogicalPlan): LogicalPlan

useCachedData…​FIXME

Note
useCachedData is used exclusively when QueryExecution is requested for a cached logical query plan.

lookupAndRefresh Internal Method

lookupAndRefresh(
  plan: LogicalPlan,
  fs: FileSystem,
  qualifiedPath: Path): Boolean

lookupAndRefresh…​FIXME

Note
lookupAndRefresh is used exclusively when CacheManager is requested to recacheByPath.

results matching ""

    No results matching ""