compactLogs(logs: Seq[T]): Seq[T]
CompactibleFileStreamLog Contract — Compactible Metadata Logs
CompactibleFileStreamLog
is the extension of the HDFSMetadataLog contract for compactible metadata logs that compactLogs every compact interval.
CompactibleFileStreamLog
uses spark.sql.streaming.minBatchesToRetain configuration property (default: 100
) for deleteExpiredLog.
CompactibleFileStreamLog
uses .compact suffix for batchIdToPath, getBatchIdFromFileName, and the compactInterval.
Method | Description |
---|---|
|
|
|
Default compaction interval Used exclusively when |
|
Used exclusively when |
|
Used exclusively when |
CompactibleFileStreamLog | Description |
---|---|
|
Creating CompactibleFileStreamLog Instance
CompactibleFileStreamLog
takes the following to be created:
Note
|
CompactibleFileStreamLog is a Scala abstract class and cannot be created directly. It is created indirectly for the concrete CompactibleFileStreamLogs.
|
batchIdToPath
Method
batchIdToPath(batchId: Long): Path
Note
|
batchIdToPath is part of the HDFSMetadataLog Contract to…FIXME.
|
batchIdToPath
…FIXME
pathToBatchId
Method
pathToBatchId(path: Path): Long
Note
|
pathToBatchId is part of the HDFSMetadataLog Contract to…FIXME.
|
pathToBatchId
…FIXME
isBatchFile
Method
isBatchFile(path: Path): Boolean
Note
|
isBatchFile is part of the HDFSMetadataLog Contract to…FIXME.
|
isBatchFile
…FIXME
Serializing Metadata (Writing Metadata in Serialized Format) — serialize
Method
serialize(
logData: Array[T],
out: OutputStream): Unit
Note
|
serialize is part of the HDFSMetadataLog Contract to serialize metadata (write metadata in serialized format).
|
serialize
firstly writes the version header (v
and the metadataLogVersion) out to the given output stream (in UTF_8
).
serialize
then writes the log data (serialized using Json4s (with Jackson binding) library). Entries are separated by new lines.
Deserializing Metadata — deserialize
Method
deserialize(in: InputStream): Array[T]
Note
|
deserialize is part of the HDFSMetadataLog Contract to…FIXME.
|
deserialize
…FIXME
Storing Metadata Of Streaming Batch — add
Method
add(
batchId: Long,
logs: Array[T]): Boolean
Note
|
add is part of the HDFSMetadataLog Contract to store metadata for a batch.
|
add
…FIXME
compact
Internal Method
compact(
batchId: Long,
logs: Array[T]): Boolean
compact
getValidBatchesBeforeCompactionBatch (with the streaming batch and the compact interval).
compact
…FIXME
In the end, compact
compactLogs and requests the parent HDFSMetadataLog
to persist metadata of a streaming batch (to a metadata log file).
Note
|
compact is used exclusively when CompactibleFileStreamLog is requested to persist metadata of a streaming batch.
|
getValidBatchesBeforeCompactionBatch
Object Method
getValidBatchesBeforeCompactionBatch(
compactionBatchId: Long,
compactInterval: Int): Seq[Long]
getValidBatchesBeforeCompactionBatch
…FIXME
Note
|
getValidBatchesBeforeCompactionBatch is used exclusively when CompactibleFileStreamLog is requested to compact.
|
isCompactionBatch
Object Method
isCompactionBatch(batchId: Long, compactInterval: Int): Boolean
isCompactionBatch
…FIXME
Note
|
|
getBatchIdFromFileName
Object Method
getBatchIdFromFileName(fileName: String): Long
getBatchIdFromFileName
simply removes the .compact suffix from the given fileName
and converts the remaining part to a number.
Note
|
getBatchIdFromFileName is used when CompactibleFileStreamLog is requested to pathToBatchId, isBatchFile, and deleteExpiredLog.
|
deleteExpiredLog
Internal Method
deleteExpiredLog(
currentBatchId: Long): Unit
deleteExpiredLog
does nothing and simply returns when the current batch ID incremented (currentBatchId + 1
) is below the compact interval plus the minBatchesToRetain.
deleteExpiredLog
…FIXME
Note
|
deleteExpiredLog is used exclusively when CompactibleFileStreamLog is requested to store metadata of a streaming batch.
|