val spark: SparkSession = ...
spark.sharedState.cacheManager
CacheManager — In-Memory Cache for Tables and Views
CacheManager is an in-memory cache (registry) for structured queries (by their logical plans).
CacheManager is shared across SparkSessions through SharedState.
CacheManager uses the cachedData internal registry to manage cached structured queries and their InMemoryRelation cached representation.
CacheManager can be empty.
CacheManager uses CachedData data structure for managing cached structured queries with the LogicalPlan (of a structured query) and a corresponding InMemoryRelation leaf logical operator.
|
Tip
|
Enable Add the following line to
Refer to Logging. |
Cached Structured Queries — cachedData Internal Registry
cachedData: LinkedList[CachedData]
cachedData is a collection of CachedData.
A new CachedData added when CacheManager is requested to:
A CachedData removed when CacheManager is requested to:
All CachedData removed (cleared) when CacheManager is requested to clearCache
lookupCachedData Method
lookupCachedData(query: Dataset[_]): Option[CachedData]
lookupCachedData(plan: LogicalPlan): Option[CachedData]
lookupCachedData…FIXME
|
Note
|
|
Un-caching Dataset — uncacheQuery Method
uncacheQuery(
query: Dataset[_],
cascade: Boolean,
blocking: Boolean = true): Unit
uncacheQuery(
spark: SparkSession,
plan: LogicalPlan,
cascade: Boolean,
blocking: Boolean): Unit
uncacheQuery…FIXME
|
Note
|
|
isEmpty Method
isEmpty: Boolean
isEmpty simply says whether there are any CachedData entries in the cachedData internal registry.
Caching Dataset — cacheQuery Method
cacheQuery(
query: Dataset[_],
tableName: Option[String] = None,
storageLevel: StorageLevel = MEMORY_AND_DISK): Unit
cacheQuery adds the analyzed logical plan of the input Dataset to the cachedData internal registry of cached queries.
Internally, cacheQuery requests the Dataset for the analyzed logical plan and creates a InMemoryRelation with the following properties:
-
spark.sql.inMemoryColumnarStorage.compressed (enabled by default)
-
spark.sql.inMemoryColumnarStorage.batchSize (default:
10000) -
Input
storageLevelstorage level (default:MEMORY_AND_DISK) -
Optimized physical query plan (after requesting
SessionStateto execute the analyzed logical plan) -
Input
tableName -
Statistics of the analyzed query plan
cacheQuery then creates a CachedData (for the analyzed query plan and the InMemoryRelation) and adds it to the cachedData internal registry.
If the input query has already been cached, cacheQuery simply prints the following WARN message to the logs and exits (i.e. does nothing but prints out the WARN message):
Asked to cache already cached data.
|
Note
|
|
Removing All Cached Logical Plans — clearCache Method
clearCache(): Unit
clearCache takes every CachedData from the cachedData internal registry and requests it for the InMemoryRelation to access the CachedRDDBuilder. clearCache requests the CachedRDDBuilder to clearCache.
In the end, clearCache removes all CachedData entries from the cachedData internal registry.
|
Note
|
clearCache is used exclusively when CatalogImpl is requested to clear the cache.
|
Re-Caching Structured Query — recacheByCondition Internal Method
recacheByCondition(spark: SparkSession, condition: LogicalPlan => Boolean): Unit
recacheByCondition…FIXME
|
Note
|
recacheByCondition is used when CacheManager is requested to uncache a structured query, recacheByPlan, and recacheByPath.
|
recacheByPlan Method
recacheByPlan(spark: SparkSession, plan: LogicalPlan): Unit
recacheByPlan…FIXME
|
Note
|
recacheByPlan is used exclusively when InsertIntoDataSourceCommand logical command is executed.
|
recacheByPath Method
recacheByPath(spark: SparkSession, resourcePath: String): Unit
recacheByPath…FIXME
|
Note
|
recacheByPath is used exclusively when CatalogImpl is requested to refreshByPath.
|
Replacing Segments of Logical Query Plan With Cached Data — useCachedData Method
useCachedData(plan: LogicalPlan): LogicalPlan
useCachedData…FIXME
|
Note
|
useCachedData is used exclusively when QueryExecution is requested for a cached logical query plan.
|
lookupAndRefresh Internal Method
lookupAndRefresh(
plan: LogicalPlan,
fs: FileSystem,
qualifiedPath: Path): Boolean
lookupAndRefresh…FIXME
|
Note
|
lookupAndRefresh is used exclusively when CacheManager is requested to recacheByPath.
|