serialize(
offsetSeq: OffsetSeq,
out: OutputStream): Unit
OffsetSeqLog — Hadoop DFS-based Metadata Storage of OffsetSeqs
OffsetSeqLog
is a Hadoop DFS-based metadata storage for OffsetSeq metadata.
OffsetSeqLog
uses OffsetSeq for metadata which holds an ordered collection of offsets and optional metadata (as OffsetSeqMetadata for event-time watermark).
OffsetSeqLog
is created exclusively for the write-ahead log (WAL) of offsets of stream execution engines (i.e. ContinuousExecution and MicroBatchExecution).
OffsetSeqLog
uses 1
for the version when serializing and deserializing metadata.
Creating OffsetSeqLog Instance
OffsetSeqLog
(like the parent HDFSMetadataLog) takes the following to be created:
Serializing Metadata (Writing Metadata in Serialized Format) — serialize
Method
Note
|
serialize is part of HDFSMetadataLog Contract to serialize metadata (write metadata in serialized format).
|
serialize
firstly writes out the version prefixed with v
on a single line (e.g. v1
) followed by the optional metadata in JSON format.
serialize
then writes out the offsets in JSON format, one per line.
Note
|
No offsets to write in offsetSeq for a streaming source is marked as - (a dash) in the log.
|
$ ls -tr [checkpoint-directory]/offsets
0 1 2 3 4 5 6
$ cat [checkpoint-directory]/offsets/6
v1
{"batchWatermarkMs":0,"batchTimestampMs":1502872590006,"conf":{"spark.sql.shuffle.partitions":"200","spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider"}}
51
Deserializing Metadata (Reading OffsetSeq from Serialized Format) — deserialize
Method
deserialize(in: InputStream): OffsetSeq
Note
|
deserialize is part of HDFSMetadataLog Contract to deserialize metadata (read metadata from serialized format).
|
deserialize
firstly parses the version on the first line.
deserialize
reads the optional metadata (with an empty line for metadata not available).
deserialize
creates a SerializedOffset for every line left.
In the end, deserialize
creates a OffsetSeq for the optional metadata and the SerializedOffsets
.
When there are no lines in the InputStream
, deserialize
throws an IllegalStateException
:
Incomplete log file