// ./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.0
// Writing data to Avro file(s)
spark
.range(1)
.write
.format("avro") // <-- Triggers AvroFileFormat
.save("data.avro")
// Reading Avro data from file(s)
val q = spark
.read
.format("avro") // <-- Triggers AvroFileFormat
.load("data.avro")
scala> q.show
+---+
| id|
+---+
| 0|
+---+
AvroFileFormat — FileFormat For Avro-Encoded Files
AvroFileFormat
is a FileFormat for Apache Avro, i.e. a data source format that can read and write Avro-encoded data in files.
AvroFileFormat
is a DataSourceRegister and registers itself as avro data source.
AvroFileFormat
is splitable, i.e. FIXME
Building Partitioned Data Reader — buildReader
Method
buildReader(
spark: SparkSession,
dataSchema: StructType,
partitionSchema: StructType,
requiredSchema: StructType,
filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
|
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.
|
buildReader
…FIXME
Inferring Schema — inferSchema
Method
inferSchema(
spark: SparkSession,
options: Map[String, String],
files: Seq[FileStatus]): Option[StructType]
Note
|
inferSchema is part of the FileFormat Contract to infer (return) the schema of the given files.
|
inferSchema
…FIXME
Preparing Write Job — prepareWrite
Method
prepareWrite(
spark: SparkSession,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Note
|
prepareWrite is part of the FileFormat Contract to prepare a write job.
|
prepareWrite
…FIXME