spark.read.format("csv").load("csv-datasets")
// or the same as above using a shortcut
spark.read.csv("csv-datasets")
CSVFileFormat
CSVFileFormat
is a TextBasedFileFormat for csv format (i.e. registers itself to handle files in csv format and converts them to Spark SQL rows).
CSVFileFormat
uses CSV options (that in turn are used to configure the underlying CSV parser from uniVocity-parsers project).
Option | Default Value | Description |
---|---|---|
|
Alias of encoding |
|
|
One character to…FIXME |
|
Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of compression |
||
|
||
Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of codec |
||
|
Uses |
|
|
Alias of sep |
|
|
Alias of charset |
|
|
||
|
||
|
||
|
||
|
||
|
||
|
Possible values:
|
|
|
||
|
||
|
||
(empty string) |
||
|
||
|
Alias of delimiter |
|
|
Uses timeZone and |
|
|
||
|
Preparing Write Job — prepareWrite
Method
prepareWrite(
sparkSession: SparkSession,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite
…FIXME
Building Partitioned Data Reader — buildReader
Method
buildReader(
sparkSession: SparkSession,
dataSchema: StructType,
partitionSchema: StructType,
requiredSchema: StructType,
filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
|
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.
|
buildReader
…FIXME