CSVFileFormat

CSVFileFormat is a TextBasedFileFormat for csv format (i.e. registers itself to handle files in csv format and converts them to Spark SQL rows).

spark.read.format("csv").load("csv-datasets")

// or the same as above using a shortcut
spark.read.csv("csv-datasets")

CSVFileFormat uses CSV options (that in turn are used to configure the underlying CSV parser from uniVocity-parsers project).

Table 1. CSVFileFormat’s Options
Option Default Value Description

charset

UTF-8

Alias of encoding

charToEscapeQuoteEscaping

\\

One character to…​FIXME

codec

Compression codec that can be either one of the known aliases or a fully-qualified class name.

Alias of compression

columnNameOfCorruptRecord

comment

\u0000

compression

Compression codec that can be either one of the known aliases or a fully-qualified class name.

Alias of codec

dateFormat

yyyy-MM-dd

Uses en_US locale

delimiter

, (comma)

Alias of sep

encoding

UTF-8

Alias of charset

escape

\\

escapeQuotes

true

header

ignoreLeadingWhiteSpace

  • false (for reading)

  • true (for writing)

ignoreTrailingWhiteSpace

  • false (for reading)

  • true (for writing)

inferSchema

maxCharsPerColumn

-1

maxColumns

20480

mode

PERMISSIVE

Possible values:

  • DROPMALFORMED

  • PERMISSIVE (default)

  • FAILFAST

multiLine

false

nanValue

NaN

negativeInf

-Inf

nullValue

(empty string)

positiveInf

Inf

sep

, (comma)

Alias of delimiter

timestampFormat

yyyy-MM-dd’T’HH:mm:ss.SSSXXX

Uses timeZone and en_US locale

timeZone

spark.sql.session.timeZone

quote

\"

quoteAll

false

Preparing Write Job — prepareWrite Method

prepareWrite(
  sparkSession: SparkSession,
  job: Job,
  options: Map[String, String],
  dataSchema: StructType): OutputWriterFactory
Note
prepareWrite is part of the FileFormat Contract to prepare a write job.

prepareWrite…​FIXME

Building Partitioned Data Reader — buildReader Method

buildReader(
  sparkSession: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.

buildReader…​FIXME

results matching ""

    No results matching ""