InMemoryFileIndex

InMemoryFileIndex is a PartitioningAwareFileIndex for a partition schema and file list.

InMemoryFileIndex is created when:

HiveMetastoreCatalog is requested to inferIfNeeded (when requested to convert a HiveTableRelation)
CatalogFileIndex is requested for the partitions by the given predicate expressions for a non-partitioned Hive table
DataSource is requested to createInMemoryFileIndex
Spark Structured Streaming’s FileStreamSource is used

Creating InMemoryFileIndex Instance

InMemoryFileIndex takes the following to be created:

SparkSession
Root paths (as Hadoop Paths)
Options for partition discovery
Optional user-defined schema
FileStatusCache (default: NoopCache)

InMemoryFileIndex initializes the internal properties.

Internal Properties

Name Description

rootPaths

The root paths with no _spark_metadata streaming metadata directories (of Spark Structured Streaming’s FileStreamSink when reading the output of a streaming query)

Note	`rootPaths` is part of the FileIndex contract.

InMemoryFileIndex

InMemoryFileIndex

Creating InMemoryFileIndex Instance

Internal Properties

results matching ""

No results matching ""