CacheManager is an in-memory cache for tables and views (as logical plans). It uses the internal cachedData collection of CachedData to track logical plans and their cached InMemoryRelation representation.
CacheManager is shared across
SparkSessions through SharedState.
cacheQuery( query: Dataset[_], tableName: Option[String] = None, storageLevel: StorageLevel = MEMORY_AND_DISK): Unit
cacheQuery creates a InMemoryRelation with the following properties:
If however the input
query has already been cached,
cacheQuery simply prints the following WARN message to the logs and exits:
WARN CacheManager: Asked to cache already cached data.
clearCache acquires a write lock and unpersists
RDD[CachedBatch]s of the queries in cachedData before removing them altogether.