MetadataLogFileIndex

MetadataLogFileIndex is a PartitioningAwareFileIndex of metadata log files (generated by FileStreamSink).

MetadataLogFileIndex is created when:

  • DataSource (Spark SQL) is requested to resolve a FileFormat relation (resolveRelation) and creates a HadoopFsRelation

  • FileStreamSource is requested to allFilesUsingMetadataLogFileIndex

Tip

Enable ALL logging level for org.apache.spark.sql.execution.streaming.MetadataLogFileIndex to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.streaming.MetadataLogFileIndex=ALL

Refer to Logging.

Creating MetadataLogFileIndex Instance

MetadataLogFileIndex takes the following to be created:

  • SparkSession

  • Hadoop’s Path

  • User-defined schema (Option[StructType])

MetadataLogFileIndex initializes the internal properties.

While being created, MetadataLogFileIndex prints out the following INFO message to the logs:

Reading streaming file log from [metadataDirectory]

Internal Properties

Name Description

metadataDirectory

Metadata directory (Hadoop’s Path of the _spark_metadata directory under the path)

Used when…​FIXME

metadataLog

allFilesFromLog

Metadata log files

results matching ""

    No results matching ""