ForeachBatchSink is a streaming sink that is used for the DataStreamWriter.foreachBatch streaming operator.

ForeachBatchSink is created exclusively when DataStreamWriter is requested to start execution of the streaming query (with the foreachBatch source).

ForeachBatchSink uses ForeachBatchSink name.

import org.apache.spark.sql.Dataset
val q = spark.readStream
  .foreachBatch { (output: Dataset[_], batchId: Long) => // <-- creates a ForeachBatchSink
    println(s"Batch ID: $batchId")
// q.stop

scala> println(q.lastProgress.sink.description)

Creating ForeachBatchSink Instance

ForeachBatchSink takes the following when created:

  • Batch writer ((Dataset[T], Long) ⇒ Unit)

  • Encoder (ExpressionEncoder[T])

Adding Batch — addBatch Method

addBatch(batchId: Long, data: DataFrame): Unit
addBatch is a part of Sink Contract to "add" a batch of data to the sink.


results matching ""

    No results matching ""