FileStreamSourceLog

FileStreamSourceLog is a concrete CompactibleFileStreamLog (of FileEntry metadata) of FileStreamSource.

FileStreamSourceLog uses a fixed-size cache of metadata of compaction batches.

FileStreamSourceLog uses spark.sql.streaming.fileSource.log.compactInterval configuration property (default: 10) for the default compaction interval.

FileStreamSourceLog uses spark.sql.streaming.fileSource.log.cleanupDelay configuration property (default: 10 minutes) for the fileCleanupDelayMs.

FileStreamSourceLog uses spark.sql.streaming.fileSource.log.deletion configuration property (default: true) for the isDeletingExpiredLog.

Creating FileStreamSourceLog Instance

FileStreamSourceLog (like the parent CompactibleFileStreamLog) takes the following to be created:

  • Metadata version

  • SparkSession

  • Path of the metadata log directory

Storing (Adding) Metadata of Streaming Batch — add Method

add(
  batchId: Long,
  logs: Array[FileEntry]): Boolean
Note
add is part of the MetadataLog Contract to store (add) metadata of a streaming batch.

add requests the parent CompactibleFileStreamLog to store metadata (possibly compacting logs if the batch is compaction).

If so (and this is a compation batch), add adds the batch and the logs to fileEntryCache internal registry (and possibly removing the eldest entry if the size is above the cacheSize).

get Method

get(
  startId: Option[Long],
  endId: Option[Long]): Array[(Long, Array[FileEntry])]
Note
get is part of the MetadataLog Contract to…​FIXME.

get…​FIXME

Internal Properties

Name Description

cacheSize

Size of the fileEntryCache that is exactly the compact interval

Used when the fileEntryCache is requested to add a new entry in add and get a compaction batch

fileEntryCache

Metadata of a streaming batch (FileEntry) per batch ID (LinkedHashMap[Long, Array[FileEntry]]) of size configured using the cacheSize

Used when get (for a compaction batch)

results matching ""

    No results matching ""