PartitioningAwareFileIndex

PartitioningAwareFileIndex is an extension of the FileIndex contract for indices that are aware of partitioned tables.

Table 1. PartitioningAwareFileIndex Contract (Abstract Methods Only)
Method	Description
`leafDirToChildrenFiles`	`leafDirToChildrenFiles: Map[Path, Array[FileStatus]]` Used when `PartitioningAwareFileIndex` is requested to listFiles, allFiles, and inferPartitioning
`leafFiles`	`leafFiles: LinkedHashMap[Path, FileStatus]` Used when `PartitioningAwareFileIndex` is requested for all files and base paths
`partitionSpec`	`partitionSpec(): PartitionSpec` Partition specification (partition columns, their directories as Hadoop Paths and partition values) Used when `PartitioningAwareFileIndex` is requested for the partition schema, files, and all files

Table 2. PartitioningAwareFileIndexes (Direct Implementations and Extensions Only)
PartitioningAwareFileIndex	Description
InMemoryFileIndex
`MetadataLogFileIndex`	Spark Structured Streaming

Creating PartitioningAwareFileIndex Instance

PartitioningAwareFileIndex takes the following to be created:

SparkSession
Options for partition discovery
Optional user-defined schema
FileStatusCache (default: NoopCache)

PartitioningAwareFileIndex initializes the internal properties.

Note	`PartitioningAwareFileIndex` is an abstract class and cannot be created directly. It is created indirectly for the concrete PartitioningAwareFileIndices.

`listFiles` Method

listFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]

Note	`listFiles` is part of the FileIndex contract.

listFiles…FIXME

`partitionSchema` Method

partitionSchema: StructType

Note	`partitionSchema` is part of the FileIndex contract.

partitionSchema simply returns the partition columns (as a StructType) of the partition specification.

`inputFiles` Method

inputFiles: Array[String]

Note	`inputFiles` is part of the FileIndex contract.

inputFiles simply returns the location of all the files.

`sizeInBytes` Method

sizeInBytes: Long

Note	`sizeInBytes` is part of the FileIndex contract.

sizeInBytes simply sums up the length (in bytes) of all the files.

`allFiles` Method

allFiles(): Seq[FileStatus]

allFiles…FIXME

Note	`allFiles` is used when: `DataSource` is requested to getOrInferFileFormatSchema, resolveRelation `PartitioningAwareFileIndex` is requested to listFiles, inputFiles, and sizeInBytes Spark Structured Streaming’s `FileStreamSource` is used

`inferPartitioning` Method

inferPartitioning(): PartitionSpec

inferPartitioning…FIXME

Note	`inferPartitioning` is used when InMemoryFileIndex and Spark Structured Streaming’s `MetadataLogFileIndex` are requested for the partitionSpec.

`basePaths` Internal Method

basePaths: Set[Path]

basePaths…FIXME

Note	`basePaths` is used when `PartitioningAwareFileIndex` is requested to inferPartitioning.

Internal Properties

Name Description

Name	Description
`hadoopConf`	Hadoop Configuration

hadoopConf

Hadoop Configuration

PartitioningAwareFileIndex

PartitioningAwareFileIndex

Creating PartitioningAwareFileIndex Instance

`listFiles` Method

`partitionSchema` Method

`inputFiles` Method

`sizeInBytes` Method

`allFiles` Method

`inferPartitioning` Method

`basePaths` Internal Method

Internal Properties

results matching ""

No results matching ""

PartitioningAwareFileIndex

Creating PartitioningAwareFileIndex Instance

listFiles Method

partitionSchema Method

inputFiles Method

sizeInBytes Method

allFiles Method

inferPartitioning Method

basePaths Internal Method

Internal Properties

results matching ""

No results matching ""

`listFiles` Method

`partitionSchema` Method

`inputFiles` Method

`sizeInBytes` Method

`allFiles` Method

`inferPartitioning` Method

`basePaths` Internal Method