FileCommitProtocol Contract

FileCommitProtocol is the abstraction of FIXME that can FIXME.

Table 1. FileCommitProtocol Contract
Method Description

abortJob

abortJob(jobContext: JobContext): Unit

Used when…​FIXME

abortTask

abortTask(taskContext: TaskAttemptContext): Unit

Used when…​FIXME

commitJob

commitJob(
  jobContext: JobContext,
  taskCommits: Seq[TaskCommitMessage]): Unit

Used when…​FIXME

commitTask

commitTask(taskContext: TaskAttemptContext): TaskCommitMessage

Used when…​FIXME

newTaskTempFile

newTaskTempFile(
  taskContext: TaskAttemptContext,
  dir: Option[String],
  ext: String): String

Used when…​FIXME

newTaskTempFileAbsPath

newTaskTempFileAbsPath(
  taskContext: TaskAttemptContext,
  absoluteDir: String,
  ext: String): String

Used when…​FIXME

onTaskCommit

onTaskCommit(taskCommit: TaskCommitMessage): Unit = {}

Used when…​FIXME

setupJob

setupJob(jobContext: JobContext): Unit

Used when…​FIXME

setupTask

setupTask(taskContext: TaskAttemptContext): Unit

Used when…​FIXME

setupJob

setupJob(jobContext: JobContext): Unit

Used when…​FIXME

Table 2. FileCommitProtocols (Direct Implementations and Extensions)
FileCommitProtocol Description

HadoopMapReduceCommitProtocol

ManifestFileCommitProtocol

Tip

Enable ALL logging level for org.apache.spark.internal.io.FileCommitProtocol logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.internal.io.FileCommitProtocol=ALL

Refer to Logging.

Creating FileCommitProtocol Instance (Given Class Name) — instantiate Object Method

instantiate(
  className: String,
  jobId: String,
  outputPath: String,
  dynamicPartitionOverwrite: Boolean = false): FileCommitProtocol

instantiate prints out the following DEBUG message to the logs:

Creating committer [className]; job [jobId]; output=[outputPath]; dynamic=[dynamicPartitionOverwrite]

instantiate creates an instance of FileCommitProtocol for the given fully-qualified className using either 3-argument or 2-argument constructor and prints out the following DEBUG messages to the logs per the argument variant:

Using (String, String, Boolean) constructor
Falling back to (String, String) constructor
Note

instantiate is used when:

  • InsertIntoHadoopFsRelationCommand logical command is executed

  • SaveAsHiveFile is requested to saveAsHiveFile

  • HadoopMapRedWriteConfigUtil and HadoopMapReduceWriteConfigUtil are requested to createCommitter

  • Spark Structured Streaming’s FileStreamSink is requested to addBatch

results matching ""

    No results matching ""