HDFSMetadataLog — MetadataLog with Hadoop HDFS for Reliable Storage

HDFSMetadataLog is a concrete MetadataLog that uses Hadoop HDFS for a reliable storage.

HDFSMetadataLog uses the given path as the root directory of metadata logs. The path is immediately converted to a Hadoop Path for file management.

HDFSMetadataLog uses Json4s with the Jackson binding for JSON parsing (serialization and deserialization).

HDFSMetadataLog is further customized by the extensions.

Table 1. HDFSMetadataLogs
HDFSMetadataLog Description

CommitLog

Offset commit log of streaming query execution engines

CompactibleFileStreamLog

Compactible metadata logs

OffsetSeqLog

Write-ahead log (WAL) of streaming query execution engines

HDFSMetadataLog takes the following to be created:

  • SparkSession

  • Path of the metadata log directory

While being created HDFSMetadataLog creates the path unless exists already.

Writing Metadata in Serialized Format — serialize Method

serialize(metadata: T, out: OutputStream): Unit

serialize…​FIXME

Note
serialize is used exclusively when HDFSMetadataLog is requested to writeBatchToFile (when requested to store metadata for a batch).

Deserializing Metadata — deserialize Method

deserialize(in: InputStream): T

deserialize deserializes a metadata (of type T) from a given InputStream.

Note
deserialize is used exclusively when HDFSMetadataLog is requested to retrieve metadata for a batch.

createFileManager Internal Method

createFileManager(): FileManager
Caution
FIXME
Note
createFileManager is used exclusively when HDFSMetadataLog is created (and the internal FileManager is created alongside).

Retrieving Metadata For Batch — get Method

get(batchId: Long): Option[T]
Note
get is part of the MetadataLog Contract to…​FIXME.

get…​FIXME

Retrieving Metadata For Batch Id Range — get Method

get(
  startId: Option[Long],
  endId: Option[Long]): Array[(Long, T)]
Note
get is part of the MetadataLog Contract to…​FIXME.

get…​FIXME

Adding Metadata for Batch — add Method

add(
  batchId: Long,
  metadata: T): Boolean
Note
add is part of the MetadataLog Contract to add metadata for a batch.

add returns true when the metadata has just been written for the batch (and the batchIdToPath for the given batchId was not available).

Retrieving Latest Committed Batch Id with Metadata If Available — getLatest Method

getLatest(): Option[(Long, T)]
Note
getLatest is a part of MetadataLog Contract to retrieve the recently-committed batch id and the corresponding metadata if available in the metadata storage.

getLatest requests the internal FileManager for the files in metadata directory that match batch file filter.

getLatest takes the batch ids (the batch files correspond to) and sorts the ids in reverse order.

getLatest gives the first batch id with the metadata which could be found in the metadata storage.

Note
It is possible that the batch id could be in the metadata storage, but not available for retrieval.

Removing Expired Metadata (Purging) — purge Method

purge(thresholdBatchId: Long): Unit
Note
purge is part of the MetadataLog Contract to…​FIXME.

purge…​FIXME

getOrderedBatchFiles Method

getOrderedBatchFiles(): Array[FileStatus]

getOrderedBatchFiles…​FIXME

Note
getOrderedBatchFiles is used when…​FIXME

Creating Batch Metadata File — batchIdToPath Method

batchIdToPath(batchId: Long): Path

batchIdToPath simply creates a Hadoop Path for the file called by the specified batchId under the metadata directory.

Note
batchIdToPath is used when…​FIXME

isBatchFile Method

isBatchFile(path: Path): Boolean

isBatchFile…​FIXME

Note
isBatchFile is used when…​FIXME

Retrieving Version (From Text Line) — parseVersion Internal Method

parseVersion(text: String, maxSupportedVersion: Int): Int

parseVersion…​FIXME

Note
parseVersion is used when…​FIXME

pathToBatchId Method

pathToBatchId(path: Path): Long

pathToBatchId…​FIXME

Note
pathToBatchId is used when…​FIXME

Writing Batch Metadata to File — writeBatchToFile Internal Method

writeBatchToFile(
  metadata: T,
  path: Path): Unit

writeBatchToFile requests the CheckpointFileManager to createAtomic (for the specified path and the overwriteIfPossible flag disabled).

writeBatchToFile then serializes the metadata (to the CancellableFSDataOutputStream output stream) and closes the stream.

In case of an exception, writeBatchToFile simply requests the CancellableFSDataOutputStream output stream to cancel (so that the output file is not generated) and re-throws the exception.

Note
writeBatchToFile is used exclusively when HDFSMetadataLog is requested to add (persists) metadata for a batch.

purgeAfter Method

purgeAfter(thresholdBatchId: Long): Unit

purgeAfter…​FIXME

Note
purgeAfter seems to be used exclusively in tests.

Internal Properties

Name Description

batchFilesFilter

Hadoop HDFS’s PathFilter of batch files (with names being long numbers)

Used when:

fileManager

CheckpointFileManager

Used when…​FIXME

results matching ""

    No results matching ""