spark.read.format("text").load("text-datasets")
// or the same as above using a shortcut
spark.read.text("text-datasets")
TextFileFormat
TextFileFormat
is a TextBasedFileFormat for text format.
TextFileFormat
uses text options while loading a dataset.
Option | Default Value | Description |
---|---|---|
Compression codec that can be either one of the known aliases or a fully-qualified class name. |
||
|
Enables loading a file as a single row (i.e. not splitting by "\n") |
prepareWrite
Method
prepareWrite(
sparkSession: SparkSession,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Note
|
prepareWrite is part of FileFormat Contract that is used when FileFormatWriter is requested to write the result of a structured query.
|
prepareWrite
…FIXME
Building Partitioned Data Reader — buildReader
Method
buildReader(
sparkSession: SparkSession,
dataSchema: StructType,
partitionSchema: StructType,
requiredSchema: StructType,
filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
|
buildReader is part of FileFormat Contract to…FIXME
|
buildReader
…FIXME