User-Friendly Names Of Cached Queries in web UI’s Storage Tab

As you may have noticed, web UI’s Storage tab displays some cached queries with user-friendly RDD names (e.g. "In-memory table [name]") while others not (e.g. "Scan JDBCRelation…​").

spark sql caching webui storage.png
Figure 1. Cached Queries in web UI (Storage Tab)

"In-memory table [name]" RDD names are the result of SQL’s CACHE TABLE or when Catalog is requested to cache a table.

// register Dataset as temporary view (table)
// caching is lazy and won't happen until an action is executed
val one = spark.table("one").cache
// The following gives "*Range (0, 1, step=1, splits=8)"
// WHY?!

scala> spark.catalog.isCached("one")
res0: Boolean = true


// caching is lazy
spark.catalog.cacheTable("one", StorageLevel.MEMORY_ONLY)
// The following gives "In-memory table one"

// SQL's CACHE TABLE is eager
// The following gives "In-memory table `hundred`"
// WHY single quotes?
spark.sql("CACHE TABLE hundred")

// register Dataset under name
val ds = spark.range(20)
spark.sharedState.cacheManager.cacheQuery(ds, Some("twenty"))
// trigger an action

The other RDD names are due to caching a Dataset.

val ten = spark.range(10).cache

results matching ""

    No results matching ""