CreateDataSourceTableAsSelectCommand Logical Command

CreateDataSourceTableAsSelectCommand is a logical command that creates a DataSource table with the data from a structured query (AS query).

Note
A DataSource table is a Spark SQL native table that uses any data source but Hive (per USING clause).

CreateDataSourceTableAsSelectCommand is created when DataSourceAnalysis post-hoc logical resolution rule is executed (and resolves a CreateTable logical operator for a Spark table with a AS query).

Note
CreateDataSourceTableCommand is used instead when a CreateTable logical operator is used with no AS query.
val ctas = """
  CREATE TABLE users
  USING csv
  COMMENT 'users table'
  LOCATION '/tmp/users'
  AS SELECT * FROM VALUES ((0, "jacek"))
"""
scala> sql(ctas)
... WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider csv. Persisting data source table `default`.`users` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

val plan = sql(ctas).queryExecution.logical.numberedTreeString
org.apache.spark.sql.AnalysisException: Table default.users already exists. You need to drop it first.;
  at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:159)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
  at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3370)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3370)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  ... 49 elided

Creating CreateDataSourceTableAsSelectCommand Instance

CreateDataSourceTableAsSelectCommand takes the following to be created:

Executing Data-Writing Logical Command — run Method

run(
  sparkSession: SparkSession,
  child: SparkPlan): Seq[Row]
Note
run is part of DataWritingCommand contract.

run…​FIXME

run throws an AssertionError when the tableType of the CatalogTable is VIEW or the provider is undefined.

saveDataIntoTable Internal Method

saveDataIntoTable(
  session: SparkSession,
  table: CatalogTable,
  tableLocation: Option[URI],
  physicalPlan: SparkPlan,
  mode: SaveMode,
  tableExists: Boolean): BaseRelation

saveDataIntoTable creates a BaseRelation for…​FIXME

saveDataIntoTable…​FIXME

Note
saveDataIntoTable is used when CreateDataSourceTableAsSelectCommand is executed.

results matching ""

    No results matching ""