val spark: SparkSession = ...
spark.sharedState.cacheManager
CacheManager — In-Memory Cache for Tables and Views
CacheManager
is an in-memory cache (registry) for structured queries (by their logical plans).
CacheManager
is shared across SparkSessions
through SharedState.
CacheManager
uses the cachedData internal registry to manage cached structured queries and their InMemoryRelation cached representation.
CacheManager
can be empty.
CacheManager
uses CachedData
data structure for managing cached structured queries with the LogicalPlan (of a structured query) and a corresponding InMemoryRelation leaf logical operator.
Tip
|
Enable Add the following line to
Refer to Logging. |
Cached Structured Queries — cachedData
Internal Registry
cachedData: LinkedList[CachedData]
cachedData
is a collection of CachedData.
A new CachedData
added when CacheManager
is requested to:
A CachedData
removed when CacheManager
is requested to:
All CachedData
removed (cleared) when CacheManager
is requested to clearCache
lookupCachedData
Method
lookupCachedData(query: Dataset[_]): Option[CachedData]
lookupCachedData(plan: LogicalPlan): Option[CachedData]
lookupCachedData
…FIXME
Note
|
|
Un-caching Dataset — uncacheQuery
Method
uncacheQuery(
query: Dataset[_],
cascade: Boolean,
blocking: Boolean = true): Unit
uncacheQuery(
spark: SparkSession,
plan: LogicalPlan,
cascade: Boolean,
blocking: Boolean): Unit
uncacheQuery
…FIXME
Note
|
|
isEmpty
Method
isEmpty: Boolean
isEmpty
simply says whether there are any CachedData entries in the cachedData internal registry.
Caching Dataset — cacheQuery
Method
cacheQuery(
query: Dataset[_],
tableName: Option[String] = None,
storageLevel: StorageLevel = MEMORY_AND_DISK): Unit
cacheQuery
adds the analyzed logical plan of the input Dataset to the cachedData internal registry of cached queries.
Internally, cacheQuery
requests the Dataset
for the analyzed logical plan and creates a InMemoryRelation with the following properties:
-
spark.sql.inMemoryColumnarStorage.compressed (enabled by default)
-
spark.sql.inMemoryColumnarStorage.batchSize (default:
10000
) -
Input
storageLevel
storage level (default:MEMORY_AND_DISK
) -
Optimized physical query plan (after requesting
SessionState
to execute the analyzed logical plan) -
Input
tableName
-
Statistics of the analyzed query plan
cacheQuery
then creates a CachedData (for the analyzed query plan and the InMemoryRelation
) and adds it to the cachedData internal registry.
If the input query
has already been cached, cacheQuery
simply prints the following WARN message to the logs and exits (i.e. does nothing but prints out the WARN message):
Asked to cache already cached data.
Note
|
|
Removing All Cached Logical Plans — clearCache
Method
clearCache(): Unit
clearCache
takes every CachedData
from the cachedData internal registry and requests it for the InMemoryRelation to access the CachedRDDBuilder. clearCache
requests the CachedRDDBuilder
to clearCache.
In the end, clearCache
removes all CachedData
entries from the cachedData internal registry.
Note
|
clearCache is used exclusively when CatalogImpl is requested to clear the cache.
|
Re-Caching Structured Query — recacheByCondition
Internal Method
recacheByCondition(spark: SparkSession, condition: LogicalPlan => Boolean): Unit
recacheByCondition
…FIXME
Note
|
recacheByCondition is used when CacheManager is requested to uncache a structured query, recacheByPlan, and recacheByPath.
|
recacheByPlan
Method
recacheByPlan(spark: SparkSession, plan: LogicalPlan): Unit
recacheByPlan
…FIXME
Note
|
recacheByPlan is used exclusively when InsertIntoDataSourceCommand logical command is executed.
|
recacheByPath
Method
recacheByPath(spark: SparkSession, resourcePath: String): Unit
recacheByPath
…FIXME
Note
|
recacheByPath is used exclusively when CatalogImpl is requested to refreshByPath.
|
Replacing Segments of Logical Query Plan With Cached Data — useCachedData
Method
useCachedData(plan: LogicalPlan): LogicalPlan
useCachedData
…FIXME
Note
|
useCachedData is used exclusively when QueryExecution is requested for a cached logical query plan.
|
lookupAndRefresh
Internal Method
lookupAndRefresh(
plan: LogicalPlan,
fs: FileSystem,
qualifiedPath: Path): Boolean
lookupAndRefresh
…FIXME
Note
|
lookupAndRefresh is used exclusively when CacheManager is requested to recacheByPath.
|