StructType — Data Type for Schema Definition

StructType is a built-in data type in Spark SQL to represent a collection of StructFields that together define a schema or its part.

Note

StructType is a Seq[StructField] and therefore all things Seq apply equally here.

scala> schemaTyped.foreach(println)
StructField(a,IntegerType,true)
StructField(b,StringType,true)

Read the official documentation of scala.collection.Seq.

You can compare two StructType instances to see whether they are equal.

import org.apache.spark.sql.types.StructType

val schemaUntyped = new StructType()
  .add("a", "int")
  .add("b", "string")

import org.apache.spark.sql.types.{IntegerType, StringType}
val schemaTyped = new StructType()
  .add("a", IntegerType)
  .add("b", StringType)

scala> schemaUntyped == schemaTyped
res0: Boolean = true

StructType presents itself as <struct> or STRUCT in query plans or SQL.

Adding Fields to Schema — add methods

You can add a new StructField to your StructType. There are different variants of add method that all make for a new StructType with the field added.

add(field: StructField): StructType
add(name: String, dataType: DataType): StructType
add(name: String, dataType: DataType, nullable: Boolean): StructType
add(
  name: String,
  dataType: DataType,
  nullable: Boolean,
  metadata: Metadata): StructType
add(
  name: String,
  dataType: DataType,
  nullable: Boolean,
  comment: String): StructType
add(name: String, dataType: String): StructType
add(name: String, dataType: String, nullable: Boolean): StructType
add(
  name: String,
  dataType: String,
  nullable: Boolean,
  metadata: Metadata): StructType
add(
  name: String,
  dataType: String,
  nullable: Boolean,
  comment: String): StructType

DataType Name Conversions

simpleString: String
catalogString: String
sql: String

StructType as a custom DataType is used in query plans or SQL. It can present itself using simpleString, catalogString or sql (see DataType Contract).

scala> schemaTyped.simpleString
res0: String = struct<a:int,b:string>

scala> schemaTyped.catalogString
res1: String = struct<a:int,b:string>

scala> schemaTyped.sql
res2: String = STRUCT<`a`: INT, `b`: STRING>

Accessing StructField — apply method

apply(name: String): StructField

StructType defines its own apply method that gives you an easy access to a StructField by name.

scala> schemaTyped.printTreeString
root
 |-- a: integer (nullable = true)
 |-- b: string (nullable = true)

scala> schemaTyped("a")
res4: org.apache.spark.sql.types.StructField = StructField(a,IntegerType,true)

Creating StructType from Existing StructType — apply method

apply(names: Set[String]): StructType

This variant of apply lets you create a StructType out of an existing StructType with the names only.

scala> schemaTyped(names = Set("a"))
res0: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))

It will throw an IllegalArgumentException exception when a field could not be found.

scala> schemaTyped(names = Set("a", "c"))
java.lang.IllegalArgumentException: Field c does not exist.
  at org.apache.spark.sql.types.StructType.apply(StructType.scala:275)
  ... 48 elided

Displaying Schema As Tree — printTreeString method

printTreeString(): Unit

printTreeString prints out the schema to standard output.

scala> schemaTyped.printTreeString
root
 |-- a: integer (nullable = true)
 |-- b: string (nullable = true)

Internally, it uses treeString method to build the tree and then println it.

results matching ""

    No results matching ""