
CatalogFileIndex is a FileIndex that is created when:

Creating CatalogFileIndex Instance

CatalogFileIndex takes the following to be created:

CatalogFileIndex initializes the internal properties.

Partition Files — listFiles Method

  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]
listFiles is part of the FileIndex contract.

listFiles lists the partitions for the input partition filters and then requests them for the underlying partition files.

inputFiles Method

inputFiles: Array[String]
inputFiles is part of the FileIndex contract.

inputFiles lists all the partitions and then requests them for the input files.

rootPaths Method

rootPaths: Seq[Path]
rootPaths is part of the FileIndex contract.

rootPaths simply returns the baseLocation converted to a Hadoop Path.

Listing Partitions By Given Predicate Expressions — filterPartitions Method

  filters: Seq[Expression]): InMemoryFileIndex

filterPartitions requests the CatalogTable for the partition columns.

For a partitioned table, filterPartitions starts tracking time. filterPartitions requests the SessionCatalog for the partitions by filter and creates a PrunedInMemoryFileIndex (with the partition listing time).

For an unpartitioned table (no partition columns defined), filterPartitions simply returns a InMemoryFileIndex (with the rootPaths and no user-specified schema).


filterPartitions is used when:

Internal Properties

Name Description


Base location (as a Java URI) as defined in the CatalogTable metadata (under the locationUri of the storage)

Used when CatalogFileIndex is requested to filter the partitions and for the root paths


Hadoop Configuration

Used when CatalogFileIndex is requested to filter the partitions

results matching ""

    No results matching ""