compactLogs(logs: Seq[T]): Seq[T]
CompactibleFileStreamLog Contract — Compactible Metadata Logs
CompactibleFileStreamLog is the extension of the HDFSMetadataLog contract for compactible metadata logs that compactLogs every compact interval.
CompactibleFileStreamLog uses spark.sql.streaming.minBatchesToRetain configuration property (default: 100) for deleteExpiredLog.
CompactibleFileStreamLog uses .compact suffix for batchIdToPath, getBatchIdFromFileName, and the compactInterval.
| Method | Description |
|---|---|
|
|
|
Default compaction interval Used exclusively when |
|
Used exclusively when |
|
Used exclusively when |
| CompactibleFileStreamLog | Description |
|---|---|
|
Creating CompactibleFileStreamLog Instance
CompactibleFileStreamLog takes the following to be created:
|
Note
|
CompactibleFileStreamLog is a Scala abstract class and cannot be created directly. It is created indirectly for the concrete CompactibleFileStreamLogs.
|
batchIdToPath Method
batchIdToPath(batchId: Long): Path
|
Note
|
batchIdToPath is part of the HDFSMetadataLog Contract to…FIXME.
|
batchIdToPath…FIXME
pathToBatchId Method
pathToBatchId(path: Path): Long
|
Note
|
pathToBatchId is part of the HDFSMetadataLog Contract to…FIXME.
|
pathToBatchId…FIXME
isBatchFile Method
isBatchFile(path: Path): Boolean
|
Note
|
isBatchFile is part of the HDFSMetadataLog Contract to…FIXME.
|
isBatchFile…FIXME
Serializing Metadata (Writing Metadata in Serialized Format) — serialize Method
serialize(
logData: Array[T],
out: OutputStream): Unit
|
Note
|
serialize is part of the HDFSMetadataLog Contract to serialize metadata (write metadata in serialized format).
|
serialize firstly writes the version header (v and the metadataLogVersion) out to the given output stream (in UTF_8).
serialize then writes the log data (serialized using Json4s (with Jackson binding) library). Entries are separated by new lines.
Deserializing Metadata — deserialize Method
deserialize(in: InputStream): Array[T]
|
Note
|
deserialize is part of the HDFSMetadataLog Contract to…FIXME.
|
deserialize…FIXME
Storing Metadata Of Streaming Batch — add Method
add(
batchId: Long,
logs: Array[T]): Boolean
|
Note
|
add is part of the HDFSMetadataLog Contract to store metadata for a batch.
|
add…FIXME
compact Internal Method
compact(
batchId: Long,
logs: Array[T]): Boolean
compact getValidBatchesBeforeCompactionBatch (with the streaming batch and the compact interval).
compact…FIXME
In the end, compact compactLogs and requests the parent HDFSMetadataLog to persist metadata of a streaming batch (to a metadata log file).
|
Note
|
compact is used exclusively when CompactibleFileStreamLog is requested to persist metadata of a streaming batch.
|
getValidBatchesBeforeCompactionBatch Object Method
getValidBatchesBeforeCompactionBatch(
compactionBatchId: Long,
compactInterval: Int): Seq[Long]
getValidBatchesBeforeCompactionBatch…FIXME
|
Note
|
getValidBatchesBeforeCompactionBatch is used exclusively when CompactibleFileStreamLog is requested to compact.
|
isCompactionBatch Object Method
isCompactionBatch(batchId: Long, compactInterval: Int): Boolean
isCompactionBatch…FIXME
|
Note
|
|
getBatchIdFromFileName Object Method
getBatchIdFromFileName(fileName: String): Long
getBatchIdFromFileName simply removes the .compact suffix from the given fileName and converts the remaining part to a number.
|
Note
|
getBatchIdFromFileName is used when CompactibleFileStreamLog is requested to pathToBatchId, isBatchFile, and deleteExpiredLog.
|
deleteExpiredLog Internal Method
deleteExpiredLog(
currentBatchId: Long): Unit
deleteExpiredLog does nothing and simply returns when the current batch ID incremented (currentBatchId + 1) is below the compact interval plus the minBatchesToRetain.
deleteExpiredLog…FIXME
|
Note
|
deleteExpiredLog is used exclusively when CompactibleFileStreamLog is requested to store metadata of a streaming batch.
|