CatalogFileIndex

CatalogFileIndex is a FileIndex that is created when:

Creating CatalogFileIndex Instance

CatalogFileIndex takes the following to be created:

CatalogFileIndex initializes the internal properties.

Partition Files — listFiles Method

listFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]
Note
listFiles is part of the FileIndex contract.

listFiles lists the partitions for the input partition filters and then requests them for the underlying partition files.

inputFiles Method

inputFiles: Array[String]
Note
inputFiles is part of the FileIndex contract.

inputFiles lists all the partitions and then requests them for the input files.

rootPaths Method

rootPaths: Seq[Path]
Note
rootPaths is part of the FileIndex contract.

rootPaths simply returns the baseLocation converted to a Hadoop Path.

Listing Partitions By Given Predicate Expressions — filterPartitions Method

filterPartitions(
  filters: Seq[Expression]): InMemoryFileIndex

filterPartitions requests the CatalogTable for the partition columns.

For a partitioned table, filterPartitions starts tracking time. filterPartitions requests the SessionCatalog for the partitions by filter and creates a PrunedInMemoryFileIndex (with the partition listing time).

For an unpartitioned table (no partition columns defined), filterPartitions simply returns a InMemoryFileIndex (with the rootPaths and no user-specified schema).

Note

filterPartitions is used when:

Internal Properties

Name Description

baseLocation

Base location (as a Java URI) as defined in the CatalogTable metadata (under the locationUri of the storage)

Used when CatalogFileIndex is requested to filter the partitions and for the root paths

hadoopConf

Hadoop Configuration

Used when CatalogFileIndex is requested to filter the partitions

results matching ""

    No results matching ""