spark.read.format("json").load("json-datasets")
// or the same as above using a shortcut
spark.read.json("json-datasets")
JsonFileFormat — Built-In Support for Files in JSON Format
JsonFileFormat
is a TextBasedFileFormat for json format (i.e. registers itself to handle files in json format and convert them to Spark SQL rows).
JsonFileFormat
comes with options to further customize JSON parsing.
Note
|
JsonFileFormat uses Jackson 2.6.7 as the JSON parser library and some options map directly to Jackson’s internal options (as JsonParser.Feature ).
|
Option | Default Value | Description | ||
---|---|---|---|---|
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
Compression codec that can be either one of the known aliases or a fully-qualified class name. |
||||
|
Date format
|
|||
|
Controls whether…FIXME |
|||
|
Case insensitive name of the parse mode
|
|||
|
||||
|
||||
|
||||
|
Timestamp format
|
|||
Java’s |
isSplitable
Method
isSplitable(
sparkSession: SparkSession,
options: Map[String, String],
path: Path): Boolean
Note
|
isSplitable is part of FileFormat Contract.
|
isSplitable
…FIXME
inferSchema
Method
inferSchema(
sparkSession: SparkSession,
options: Map[String, String],
files: Seq[FileStatus]): Option[StructType]
Note
|
inferSchema is part of FileFormat Contract.
|
inferSchema
…FIXME
Building Partitioned Data Reader — buildReader
Method
buildReader(
sparkSession: SparkSession,
dataSchema: StructType,
partitionSchema: StructType,
requiredSchema: StructType,
filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
|
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.
|
buildReader
…FIXME
Preparing Write Job — prepareWrite
Method
prepareWrite(
sparkSession: SparkSession,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite
…FIXME