import org.apache.spark.sql.execution.datasources.PartitionedFile import org.apache.spark.sql.catalyst.InternalRow val partFile = PartitionedFile(InternalRow.empty, "fakePath0", 0, 10, Array("host0", "host1"))
PartitionedFile — File Block in FileFormat Data Source
PartitionedFile is a part (block) of a file that is in a sense similar to a Pqruet block or a HDFS split.
PartitionedFile represents a chunk of a file that will be read, along with partition column values appended to each row, in a partition.
|Partition column values are values of the columns that are column partitions and therefore part of the directory structure not the partitioned files themselves (that together are the partitioned dataset).|
PartitionedFile is created exclusively when
FileSourceScanExec is requested to create the input RDD for bucketed or non-bucketed reads.
PartitionedFile takes the following to be created:
Partition column values to be appended to each row (as an internal row)
Locality information that is a list of nodes (by their host names) that have the data (
Array[String]). Default: empty
PartitionedFile uses the following text representation (
path: [filePath], range: [start]-[end], partition values: [partitionValues]
scala> :type partFile org.apache.spark.sql.execution.datasources.PartitionedFile scala> println(partFile) path: fakePath0, range: 0-10, partition values: [empty row]