JsonFileFormat — Built-In Support for Files in JSON Format

JsonFileFormat is a TextBasedFileFormat for json format (i.e. registers itself to handle files in json format and convert them to Spark SQL rows).

spark.read.format("json").load("json-datasets")

// or the same as above using a shortcut
spark.read.json("json-datasets")

JsonFileFormat comes with options to further customize JSON parsing.

Note	`JsonFileFormat` uses Jackson 2.6.7 as the JSON parser library and some options map directly to Jackson’s internal options (as `JsonParser.Feature`).

Option Default Value Description

allowBackslashEscapingAnyCharacter

false

Note	Internally, `allowBackslashEscapingAnyCharacter` becomes `JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER`.

allowComments

false

Note	Internally, `allowComments` becomes `JsonParser.Feature.ALLOW_COMMENTS`.

allowNonNumericNumbers

true

Note	Internally, `allowNonNumericNumbers` becomes `JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS`.

allowNumericLeadingZeros

false

Note	Internally, `allowNumericLeadingZeros` becomes `JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS`.

allowSingleQuotes

true

Note	Internally, `allowSingleQuotes` becomes `JsonParser.Feature.ALLOW_SINGLE_QUOTES`.

allowUnquotedControlChars

false

Note	Internally, `allowUnquotedControlChars` becomes `JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS`.

allowUnquotedFieldNames

false

Note	Internally, `allowUnquotedFieldNames` becomes `JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES`.

columnNameOfCorruptRecord

compression

Compression codec that can be either one of the known aliases or a fully-qualified class name.

dateFormat

yyyy-MM-dd

Date format

Note	Internally, `dateFormat` is converted to Apache Commons Lang’s `FastDateFormat`.

multiLine

false

Controls whether…FIXME

mode

PERMISSIVE

Case insensitive name of the parse mode

PERMISSIVE
DROPMALFORMED
FAILFAST

prefersDecimal

false

primitivesAsString

false

samplingRatio

1.0

timestampFormat

yyyy-MM-dd’T’HH:mm:ss.SSSXXX

Timestamp format

Note	Internally, `timestampFormat` is converted to Apache Commons Lang’s `FastDateFormat`.

timeZone

Java’s TimeZone

`isSplitable` Method

isSplitable(
  sparkSession: SparkSession,
  options: Map[String, String],
  path: Path): Boolean

Note	`isSplitable` is part of FileFormat Contract.

isSplitable…FIXME

`inferSchema` Method

inferSchema(
  sparkSession: SparkSession,
  options: Map[String, String],
  files: Seq[FileStatus]): Option[StructType]

Note	`inferSchema` is part of FileFormat Contract.

inferSchema…FIXME

Building Partitioned Data Reader — `buildReader` Method

buildReader(
  sparkSession: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]

Note	`buildReader` is part of the FileFormat Contract to build a PartitionedFile reader.

buildReader…FIXME

Preparing Write Job — `prepareWrite` Method

prepareWrite(
  sparkSession: SparkSession,
  job: Job,
  options: Map[String, String],
  dataSchema: StructType): OutputWriterFactory

Note	`prepareWrite` is part of the FileFormat Contract to prepare a write job.

prepareWrite…FIXME

JsonFileFormat

JsonFileFormat — Built-In Support for Files in JSON Format

`isSplitable` Method

`inferSchema` Method

Building Partitioned Data Reader — `buildReader` Method

Preparing Write Job — `prepareWrite` Method

results matching ""

No results matching ""

JsonFileFormat — Built-In Support for Files in JSON Format

isSplitable Method

inferSchema Method

Building Partitioned Data Reader — buildReader Method

Preparing Write Job — prepareWrite Method

results matching ""

No results matching ""

`isSplitable` Method

`inferSchema` Method

Building Partitioned Data Reader — `buildReader` Method

Preparing Write Job — `prepareWrite` Method