serialize(
metadata: T,
out: OutputStream): Unit
HDFSMetadataLog — Hadoop DFS-based Metadata Storage
HDFSMetadataLog
is a concrete metadata storage (of type T
) that uses Hadoop DFS for fault-tolerance and reliability.
HDFSMetadataLog
uses the given path as the metadata directory with metadata logs. The path is immediately converted to a Hadoop Path for file management.
HDFSMetadataLog
uses Json4s with the Jackson binding for metadata serialization and deserialization (to and from JSON format).
HDFSMetadataLog
is further customized by the extensions.
HDFSMetadataLog | Description |
---|---|
Anonymous |
|
Anonymous |
|
Compactible metadata logs (that compact logs at regular interval) |
|
|
|
Serializing Metadata (Writing Metadata in Serialized Format) — serialize
Method
serialize
simply writes the log data (serialized using Json4s (with Jackson binding) library).
Note
|
serialize is used exclusively when HDFSMetadataLog is requested to write metadata of a streaming batch to a file (metadata log) (when storing metadata of a streaming batch).
|
Deserializing Metadata (Reading Metadata from Serialized Format) — deserialize
Method
deserialize(in: InputStream): T
deserialize
deserializes a metadata (of type T
) from a given InputStream
.
Note
|
deserialize is used exclusively when HDFSMetadataLog is requested to retrieve metadata of a batch.
|
Retrieving Metadata Of Streaming Batch — get
Method
get(batchId: Long): Option[T]
Note
|
get is part of the MetadataLog Contract to get metadata of a batch.
|
get
…FIXME
Retrieving Metadata of Range of Batches — get
Method
get(
startId: Option[Long],
endId: Option[Long]): Array[(Long, T)]
Note
|
get is part of the MetadataLog Contract to get metadata of range of batches.
|
get
…FIXME
Persisting Metadata of Streaming Micro-Batch — add
Method
add(
batchId: Long,
metadata: T): Boolean
Note
|
add is part of the MetadataLog Contract to persist metadata of a streaming batch.
|
add
return true
when the metadata of the streaming batch was not available and persisted successfully. Otherwise, add
returns false
.
Internally, add
looks up metadata of the given streaming batch (batchId
) and returns false
when found.
Otherwise, when not found, add
creates a metadata log file for the given batchId
and writes metadata to the file. add
returns true
if successful.
Latest Committed Batch Id with Metadata (When Available) — getLatest
Method
getLatest(): Option[(Long, T)]
Note
|
getLatest is a part of MetadataLog Contract to retrieve the recently-committed batch id and the corresponding metadata if available in the metadata storage.
|
getLatest
requests the internal FileManager for the files in metadata directory that match batch file filter.
getLatest
takes the batch ids (the batch files correspond to) and sorts the ids in reverse order.
getLatest
gives the first batch id with the metadata which could be found in the metadata storage.
Note
|
It is possible that the batch id could be in the metadata storage, but not available for retrieval. |
Removing Expired Metadata (Purging) — purge
Method
purge(thresholdBatchId: Long): Unit
Note
|
purge is part of the MetadataLog Contract to…FIXME.
|
purge
…FIXME
Creating Batch Metadata File — batchIdToPath
Method
batchIdToPath(batchId: Long): Path
batchIdToPath
simply creates a Hadoop Path for the file called by the specified batchId
under the metadata directory.
isBatchFile
Method
isBatchFile(path: Path): Boolean
isBatchFile
…FIXME
Note
|
isBatchFile is used exclusively when HDFSMetadataLog is requested for the PathFilter of batch files.
|
pathToBatchId
Method
pathToBatchId(path: Path): Long
pathToBatchId
…FIXME
Note
|
|
verifyBatchIds
Object Method
verifyBatchIds(
batchIds: Seq[Long],
startId: Option[Long],
endId: Option[Long]): Unit
verifyBatchIds
…FIXME
Retrieving Version (From Text Line) — parseVersion
Internal Method
parseVersion(
text: String,
maxSupportedVersion: Int): Int
parseVersion
…FIXME
Note
|
|
purgeAfter
Method
purgeAfter(thresholdBatchId: Long): Unit
purgeAfter
…FIXME
Note
|
purgeAfter seems to be used exclusively in tests.
|
Writing Batch Metadata to File (Metadata Log) — writeBatchToFile
Internal Method
writeBatchToFile(
metadata: T,
path: Path): Unit
writeBatchToFile
requests the CheckpointFileManager to createAtomic (for the specified path
and the overwriteIfPossible
flag disabled).
writeBatchToFile
then serializes the metadata (to the CancellableFSDataOutputStream
output stream) and closes the stream.
In case of an exception, writeBatchToFile
simply requests the CancellableFSDataOutputStream
output stream to cancel
(so that the output file is not generated) and re-throws the exception.
Note
|
writeBatchToFile is used exclusively when HDFSMetadataLog is requested to store (persist) metadata of a streaming batch.
|
Retrieving Ordered Batch Metadata Files — getOrderedBatchFiles
Method
getOrderedBatchFiles(): Array[FileStatus]
getOrderedBatchFiles
…FIXME
Note
|
getOrderedBatchFiles does not seem to be used at all.
|
Internal Properties
Name | Description |
---|---|
|
Hadoop’s PathFilter of batch files (with names being long numbers) Used when:
|
|
Used when…FIXME |