inferSchema is not supported for hive data source.
HiveFileFormat — FileFormat of Hive Tables
HiveFileFormat
is a FileFormat for writing Hive tables.
HiveFileFormat
is a DataSourceRegister and registers itself as hive data source.
Note
|
Hive data source can only be used with tables and you cannot read or write files of Hive data source directly. Use DataFrameReader.table to load from or DataFrameWriter.saveAsTable to write data to a Hive table. |
HiveFileFormat
is created exclusively when SaveAsHiveFile
is requested to saveAsHiveFile (when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed).
HiveFileFormat
throws a UnsupportedOperationException
when requested to inferSchema.
Preparing Write Job — prepareWrite
Method
prepareWrite(
sparkSession: SparkSession,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Note
|
prepareWrite is part of the FileFormat contract.
|
prepareWrite
sets the mapred.output.format.class property to be the getOutputFileFormatClassName
of the Hive TableDesc
of the FileSinkDesc.
prepareWrite
requests the HiveTableUtil
helper object to configureJobPropertiesForStorageHandler
.
prepareWrite
requests the Hive Utilities
helper object to copyTableJobPropertiesToConf
.
In the end, prepareWrite
creates a new OutputWriterFactory
that creates a new HiveOutputWriter
when requested for a new OutputWriter
instance.