serialize(
metadata: T,
out: OutputStream): Unit
HDFSMetadataLog — Hadoop DFS-based Metadata Storage
HDFSMetadataLog is a concrete metadata storage (of type T) that uses Hadoop DFS for fault-tolerance and reliability.
HDFSMetadataLog uses the given path as the metadata directory with metadata logs. The path is immediately converted to a Hadoop Path for file management.
HDFSMetadataLog uses Json4s with the Jackson binding for metadata serialization and deserialization (to and from JSON format).
HDFSMetadataLog is further customized by the extensions.
| HDFSMetadataLog | Description |
|---|---|
Anonymous |
|
Anonymous |
|
Compactible metadata logs (that compact logs at regular interval) |
|
|
|
Serializing Metadata (Writing Metadata in Serialized Format) — serialize Method
serialize simply writes the log data (serialized using Json4s (with Jackson binding) library).
|
Note
|
serialize is used exclusively when HDFSMetadataLog is requested to write metadata of a streaming batch to a file (metadata log) (when storing metadata of a streaming batch).
|
Deserializing Metadata (Reading Metadata from Serialized Format) — deserialize Method
deserialize(in: InputStream): T
deserialize deserializes a metadata (of type T) from a given InputStream.
|
Note
|
deserialize is used exclusively when HDFSMetadataLog is requested to retrieve metadata of a batch.
|
Retrieving Metadata Of Streaming Batch — get Method
get(batchId: Long): Option[T]
|
Note
|
get is part of the MetadataLog Contract to get metadata of a batch.
|
get…FIXME
Retrieving Metadata of Range of Batches — get Method
get(
startId: Option[Long],
endId: Option[Long]): Array[(Long, T)]
|
Note
|
get is part of the MetadataLog Contract to get metadata of range of batches.
|
get…FIXME
Persisting Metadata of Streaming Micro-Batch — add Method
add(
batchId: Long,
metadata: T): Boolean
|
Note
|
add is part of the MetadataLog Contract to persist metadata of a streaming batch.
|
add return true when the metadata of the streaming batch was not available and persisted successfully. Otherwise, add returns false.
Internally, add looks up metadata of the given streaming batch (batchId) and returns false when found.
Otherwise, when not found, add creates a metadata log file for the given batchId and writes metadata to the file. add returns true if successful.
Latest Committed Batch Id with Metadata (When Available) — getLatest Method
getLatest(): Option[(Long, T)]
|
Note
|
getLatest is a part of MetadataLog Contract to retrieve the recently-committed batch id and the corresponding metadata if available in the metadata storage.
|
getLatest requests the internal FileManager for the files in metadata directory that match batch file filter.
getLatest takes the batch ids (the batch files correspond to) and sorts the ids in reverse order.
getLatest gives the first batch id with the metadata which could be found in the metadata storage.
|
Note
|
It is possible that the batch id could be in the metadata storage, but not available for retrieval. |
Removing Expired Metadata (Purging) — purge Method
purge(thresholdBatchId: Long): Unit
|
Note
|
purge is part of the MetadataLog Contract to…FIXME.
|
purge…FIXME
Creating Batch Metadata File — batchIdToPath Method
batchIdToPath(batchId: Long): Path
batchIdToPath simply creates a Hadoop Path for the file called by the specified batchId under the metadata directory.
isBatchFile Method
isBatchFile(path: Path): Boolean
isBatchFile…FIXME
|
Note
|
isBatchFile is used exclusively when HDFSMetadataLog is requested for the PathFilter of batch files.
|
pathToBatchId Method
pathToBatchId(path: Path): Long
pathToBatchId…FIXME
|
Note
|
|
verifyBatchIds Object Method
verifyBatchIds(
batchIds: Seq[Long],
startId: Option[Long],
endId: Option[Long]): Unit
verifyBatchIds…FIXME
Retrieving Version (From Text Line) — parseVersion Internal Method
parseVersion(
text: String,
maxSupportedVersion: Int): Int
parseVersion…FIXME
|
Note
|
|
purgeAfter Method
purgeAfter(thresholdBatchId: Long): Unit
purgeAfter…FIXME
|
Note
|
purgeAfter seems to be used exclusively in tests.
|
Writing Batch Metadata to File (Metadata Log) — writeBatchToFile Internal Method
writeBatchToFile(
metadata: T,
path: Path): Unit
writeBatchToFile requests the CheckpointFileManager to createAtomic (for the specified path and the overwriteIfPossible flag disabled).
writeBatchToFile then serializes the metadata (to the CancellableFSDataOutputStream output stream) and closes the stream.
In case of an exception, writeBatchToFile simply requests the CancellableFSDataOutputStream output stream to cancel (so that the output file is not generated) and re-throws the exception.
|
Note
|
writeBatchToFile is used exclusively when HDFSMetadataLog is requested to store (persist) metadata of a streaming batch.
|
Retrieving Ordered Batch Metadata Files — getOrderedBatchFiles Method
getOrderedBatchFiles(): Array[FileStatus]
getOrderedBatchFiles…FIXME
|
Note
|
getOrderedBatchFiles does not seem to be used at all.
|
Internal Properties
| Name | Description |
|---|---|
|
Hadoop’s PathFilter of batch files (with names being long numbers) Used when:
|
|
Used when…FIXME |