FileIndex Contract

FileIndex is the abstraction of file indices that knows the root paths and partition schema of a relation.

FileIndex is associated with a HadoopFsRelation.

Table 1. FileIndex Contract
Method Description

inputFiles

inputFiles: Array[String]

File names to read when scanning this relation

Used when:

listFiles

listFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]

File names (grouped into partitions when the data is partitioned)

Used when:

metadataOpsTimeNs

metadataOpsTimeNs: Option[Long] = None

Metadata operation time for listing files (in nanoseconds)

Used when FileSourceScanExec leaf physical operator is requested for selectedPartitions

partitionSchema

partitionSchema: StructType

Used when:

refresh

refresh(): Unit

Refreshes cached file listings

Used when:

rootPaths

rootPaths: Seq[Path]

Root paths from which the catalog gets the files (as Hadoop Paths). There could be a single root path of the entire table (with partition directories) or individual partitions.

Used when:

sizeInBytes

sizeInBytes: Long

Estimated size of the data of the relation (in bytes)

Used when:

Table 2. FileIndexes (Direct Implementations and Extensions Only)
FileIndex Description

CatalogFileIndex

PartitioningAwareFileIndex

results matching ""

    No results matching ""