MetadataLogFileIndex is a PartitioningAwareFileIndex of metadata log files (generated by FileStreamSink).

MetadataLogFileIndex is created when:

  • DataSource (Spark SQL) is requested to resolve a FileFormat relation (resolveRelation) and creates a HadoopFsRelation

  • FileStreamSource is requested to allFilesUsingMetadataLogFileIndex


Enable ALL logging level for org.apache.spark.sql.execution.streaming.MetadataLogFileIndex to see what happens inside.

Creating MetadataLogFileIndex Instance

MetadataLogFileIndex takes the following to be created:

  • SparkSession

  • Hadoop’s Path

  • User-defined schema (Option[StructType])

MetadataLogFileIndex initializes the internal properties.

While being created, MetadataLogFileIndex prints out the following INFO message to the logs:

Reading streaming file log from [metadataDirectory]

Internal Properties

Name Description


Metadata directory (Hadoop’s Path of the _spark_metadata directory under the path)

Metadata log files

