Powered by GitBook

TextFileFormat

TextFileFormat is a TextBasedFileFormat for text format.

spark.read.format("text").load("text-datasets")

// or the same as above using a shortcut
spark.read.text("text-datasets")

TextFileFormat uses text options while loading a dataset.

Table 1. TextFileFormat’s Options
Option	Default Value	Description
`compression`		Compression codec that can be either one of the known aliases or a fully-qualified class name.
`wholetext`	`false`	Enables loading a file as a single row (i.e. not splitting by "\n")

`prepareWrite` Method

prepareWrite(
  sparkSession: SparkSession,
  job: Job,
  options: Map[String, String],
  dataSchema: StructType): OutputWriterFactory

Note	`prepareWrite` is part of FileFormat Contract that is used when `FileFormatWriter` is requested to write the result of a structured query.

prepareWrite…FIXME

Building Partitioned Data Reader — `buildReader` Method

buildReader(
  sparkSession: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]

Note	`buildReader` is part of FileFormat Contract to…FIXME

buildReader…FIXME

`readToUnsafeMem` Internal Method

readToUnsafeMem(
  conf: Broadcast[SerializableConfiguration],
  requiredSchema: StructType,
  wholeTextMode: Boolean): (PartitionedFile) => Iterator[UnsafeRow]

readToUnsafeMem…FIXME

Note	`readToUnsafeMem` is used exclusively when `TextFileFormat` is requested to buildReader

results matching ""

No results matching ""