FileFormat

FileFormat is the contract in Spark SQL to…​FIXME

package org.apache.spark.sql.execution.datasources

trait FileFormat {
  // only required methods that have no implementation
  // the others follow
  def inferSchema(
    sparkSession: SparkSession,
    options: Map[String, String],
    files: Seq[FileStatus]): Option[StructType]
  def prepareWrite(
    sparkSession: SparkSession,
    job: Job,
    options: Map[String, String],
    dataSchema: StructType): OutputWriterFactory
}
Table 1. (Subset of) FileFormat Contract
Method Description

inferSchema

Used when…​

prepareWrite

Used exclusively when FileFormatWriter is requested to write a query result.

supportBatch…​FIXME

vectorTypes…​FIXME

isSplitable…​FIXME

buildReader…​FIXME

buildReaderWithPartitionValues Method

buildReaderWithPartitionValues(
  sparkSession: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): PartitionedFile => Iterator[InternalRow]

buildReaderWithPartitionValues…​FIXME

Note
buildReaderWithPartitionValues is used exclusively when FileSourceScanExec is requested for input RDDs.

results matching ""

    No results matching ""