TextFileFormat

TextFileFormat is a TextBasedFileFormat for text format.

spark.read.format("text").load("text-datasets")

// or the same as above using a shortcut
spark.read.text("text-datasets")

TextFileFormat uses text options while loading a dataset.

Table 1. TextFileFormat’s Options
Option Default Value Description

compression

Compression codec that can be either one of the known aliases or a fully-qualified class name.

wholetext

false

Enables loading a file as a single row (i.e. not splitting by "\n")

prepareWrite Method

prepareWrite(
  sparkSession: SparkSession,
  job: Job,
  options: Map[String, String],
  dataSchema: StructType): OutputWriterFactory
Note
prepareWrite is part of FileFormat Contract that is used when FileFormatWriter is requested to write the result of a structured query.

prepareWrite…​FIXME

Building Partitioned Data Reader — buildReader Method

buildReader(
  sparkSession: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
buildReader is part of FileFormat Contract to…​FIXME

buildReader…​FIXME

readToUnsafeMem Internal Method

readToUnsafeMem(
  conf: Broadcast[SerializableConfiguration],
  requiredSchema: StructType,
  wholeTextMode: Boolean): (PartitionedFile) => Iterator[UnsafeRow]

readToUnsafeMem…​FIXME

Note
readToUnsafeMem is used exclusively when TextFileFormat is requested to buildReader

results matching ""

    No results matching ""