spark.read.format("json").load("json-datasets")
// or the same as above using a shortcut
spark.read.json("json-datasets")
JsonFileFormat — Built-In Support for Files in JSON Format
JsonFileFormat is a TextBasedFileFormat for json format (i.e. registers itself to handle files in json format and convert them to Spark SQL rows).
JsonFileFormat comes with options to further customize JSON parsing.
|
Note
|
JsonFileFormat uses Jackson 2.6.7 as the JSON parser library and some options map directly to Jackson’s internal options (as JsonParser.Feature).
|
| Option | Default Value | Description | ||
|---|---|---|---|---|
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
Compression codec that can be either one of the known aliases or a fully-qualified class name. |
||||
|
Date format
|
|||
|
Controls whether…FIXME |
|||
|
Case insensitive name of the parse mode
|
|||
|
||||
|
||||
|
||||
|
Timestamp format
|
|||
Java’s |
isSplitable Method
isSplitable(
sparkSession: SparkSession,
options: Map[String, String],
path: Path): Boolean
|
Note
|
isSplitable is part of FileFormat Contract.
|
isSplitable…FIXME
inferSchema Method
inferSchema(
sparkSession: SparkSession,
options: Map[String, String],
files: Seq[FileStatus]): Option[StructType]
|
Note
|
inferSchema is part of FileFormat Contract.
|
inferSchema…FIXME
Building Partitioned Data Reader — buildReader Method
buildReader(
sparkSession: SparkSession,
dataSchema: StructType,
partitionSchema: StructType,
requiredSchema: StructType,
filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
|
Note
|
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.
|
buildReader…FIXME
Preparing Write Job — prepareWrite Method
prepareWrite(
sparkSession: SparkSession,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
|
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite…FIXME