ParquetReadSupport — Non-Vectorized ReadSupport in Parquet Data Source

ParquetReadSupport is a concrete ReadSupport (from Apache Parquet) of UnsafeRows.

ParquetReadSupport is created exclusively when ParquetFileFormat is requested for a data reader (with no support for Vectorized Parquet Decoding and so falling back to parquet-mr).

ParquetReadSupport is registered as the fully-qualified class name for parquet.read.support.class Hadoop configuration when ParquetFileFormat is requested for a data reader.

ParquetReadSupport takes an optional Java TimeZone to be created.

Tip

Enable ALL logging level for org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport=ALL

Refer to Logging.

Initializing ReadSupport — init Method

init(context: InitContext): ReadContext
Note
init is part of the ReadSupport Contract to…​FIXME.

init…​FIXME

prepareForRead Method

prepareForRead(
  conf: Configuration,
  keyValueMetaData: JMap[String, String],
  fileSchema: MessageType,
  readContext: ReadContext): RecordMaterializer[UnsafeRow]
Note
prepareForRead is part of the ReadSupport Contract to…​FIXME.

prepareForRead…​FIXME

results matching ""

    No results matching ""