AvroFileFormat — FileFormat For Avro-Encoded Files

AvroFileFormat is a FileFormat for Apache Avro, i.e. a data source format that can read and write Avro-encoded data in files.

AvroFileFormat is a DataSourceRegister and registers itself as avro data source.

// ./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.0

// Writing data to Avro file(s)
spark
  .range(1)
  .write
  .format("avro") // <-- Triggers AvroFileFormat
  .save("data.avro")

// Reading Avro data from file(s)
val q = spark
  .read
  .format("avro") // <-- Triggers AvroFileFormat
  .load("data.avro")
scala> q.show
+---+
| id|
+---+
|  0|
+---+

AvroFileFormat is splitable, i.e. FIXME

Building Partitioned Data Reader — buildReader Method

buildReader(
  spark: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
buildReader is part of the FileFormat Contract to build a PartitionedFile reader.

buildReader…​FIXME

Inferring Schema — inferSchema Method

inferSchema(
  spark: SparkSession,
  options: Map[String, String],
  files: Seq[FileStatus]): Option[StructType]
Note
inferSchema is part of the FileFormat Contract to infer (return) the schema of the given files.

inferSchema…​FIXME

Preparing Write Job — prepareWrite Method

prepareWrite(
  spark: SparkSession,
  job: Job,
  options: Map[String, String],
  dataSchema: StructType): OutputWriterFactory
Note
prepareWrite is part of the FileFormat Contract to prepare a write job.

prepareWrite…​FIXME

results matching ""

    No results matching ""