CacheManager — In-Memory Cache for Tables and Views

CacheManager is an in-memory cache for tables and views (as logical plans). It uses the internal cachedData collection of CachedData to track logical plans and their cached InMemoryRelation representation.

CacheManager is shared across SparkSessions through SharedState.

val spark: SparkSession = ...
spark.sharedState.cacheManager
Note
A Spark developer can use CacheManager to cache Datasets using cache or persist operators.

Cached Queries — cachedData Internal Registry

cachedData is a collection of CachedData with logical plans and their cached InMemoryRelation representation.

cachedData is cleared when…​FIXME

invalidateCachedPath Method

Caution
FIXME

invalidateCache Method

Caution
FIXME

lookupCachedData Method

Caution
FIXME

uncacheQuery Method

Caution
FIXME

isEmpty Method

Caution
FIXME

Caching Dataset (Registering Analyzed Logical Plan as InMemoryRelation) — cacheQuery Method

cacheQuery(
  query: Dataset[_],
  tableName: Option[String] = None,
  storageLevel: StorageLevel = MEMORY_AND_DISK): Unit

cacheQuery adds the analyzed logical plan of the input query to the cachedData internal registry of cached queries.

Internally, cacheQuery firstly requests the input query for the analyzed logical plan and creates a InMemoryRelation with the following properties:

cacheQuery then creates a CachedData (for the analyzed query plan and the InMemoryRelation) and adds it to the cachedData internal registry.

If the input query has already been cached, cacheQuery simply prints the following WARN message to the logs and exits (i.e. does nothing but printing out the WARN message):

WARN CacheManager: Asked to cache already cached data.
Note

cacheQuery is used when:

Removing All Cached Tables From In-Memory Cache — clearCache Method

clearCache(): Unit

clearCache acquires a write lock and unpersists RDD[CachedBatch]s of the queries in cachedData before removing them altogether.

Note
clearCache is used when the CatalogImpl is requested to clearCache.

CachedData

Caution
FIXME

recacheByCondition Internal Method

recacheByCondition(spark: SparkSession, condition: LogicalPlan => Boolean): Unit

recacheByCondition…​FIXME

Note
recacheByCondition is used when CacheManager is requested to recacheByPlan or recacheByPath.

recacheByPlan Method

recacheByPlan(spark: SparkSession, plan: LogicalPlan): Unit

recacheByPlan…​FIXME

Note
recacheByPlan is used exclusively when InsertIntoDataSourceCommand logical command is executed.

recacheByPath Method

recacheByPath(spark: SparkSession, resourcePath: String): Unit

recacheByPath…​FIXME

Note
recacheByPath is used exclusively when CatalogImpl is requested to refreshByPath.

Replacing Logical Query Segments With Cached Query Plans — useCachedData Method

useCachedData(plan: LogicalPlan): LogicalPlan

useCachedData…​FIXME

Note
useCachedData is used exclusively when QueryExecution is requested for a cached logical query plan.

results matching ""

    No results matching ""