StructType — Data Type for Schema Definition

StructType is a built-in data type that is a collection of StructFields.

StructType is used to define a schema or its part.

You can compare two StructType instances to see whether they are equal.

import org.apache.spark.sql.types.StructType

val schemaUntyped = new StructType()
  .add("a", "int")
  .add("b", "string")

import org.apache.spark.sql.types.{IntegerType, StringType}
val schemaTyped = new StructType()
  .add("a", IntegerType)
  .add("b", StringType)

scala> schemaUntyped == schemaTyped
res0: Boolean = true

StructType presents itself as <struct> or STRUCT in query plans or SQL.

Note

StructType is a Seq[StructField] and therefore all things Seq apply equally here.

scala> schemaTyped.foreach(println)
StructField(a,IntegerType,true)
StructField(b,StringType,true)

Read the official documentation of Scala’s scala.collection.Seq.

As of Spark 2.4.0, StructType can be converted to DDL format using toDDL method.

Example: Using StructType.toDDL
// Generating a schema from a case class
// Because we're all properly lazy
case class Person(id: Long, name: String)
import org.apache.spark.sql.Encoders
val schema = Encoders.product[Person].schema
scala> println(schema.toDDL)
`id` BIGINT,`name` STRING

fromAttributes Method

fromAttributes(attributes: Seq[Attribute]): StructType

fromAttributes…​FIXME

Note
fromAttributes is used when…​FIXME

toAttributes Method

toAttributes: Seq[AttributeReference]

toAttributes…​FIXME

Note
toAttributes is used when…​FIXME

Adding Fields to Schema — add Method

You can add a new StructField to your StructType. There are different variants of add method that all make for a new StructType with the field added.

add(field: StructField): StructType
add(name: String, dataType: DataType): StructType
add(name: String, dataType: DataType, nullable: Boolean): StructType
add(
  name: String,
  dataType: DataType,
  nullable: Boolean,
  metadata: Metadata): StructType
add(
  name: String,
  dataType: DataType,
  nullable: Boolean,
  comment: String): StructType
add(name: String, dataType: String): StructType
add(name: String, dataType: String, nullable: Boolean): StructType
add(
  name: String,
  dataType: String,
  nullable: Boolean,
  metadata: Metadata): StructType
add(
  name: String,
  dataType: String,
  nullable: Boolean,
  comment: String): StructType

DataType Name Conversions

simpleString: String
catalogString: String
sql: String

StructType as a custom DataType is used in query plans or SQL. It can present itself using simpleString, catalogString or sql (see DataType Contract).

scala> schemaTyped.simpleString
res0: String = struct<a:int,b:string>

scala> schemaTyped.catalogString
res1: String = struct<a:int,b:string>

scala> schemaTyped.sql
res2: String = STRUCT<`a`: INT, `b`: STRING>

Accessing StructField — apply Method

apply(name: String): StructField

StructType defines its own apply method that gives you an easy access to a StructField by name.

scala> schemaTyped.printTreeString
root
 |-- a: integer (nullable = true)
 |-- b: string (nullable = true)

scala> schemaTyped("a")
res4: org.apache.spark.sql.types.StructField = StructField(a,IntegerType,true)

Creating StructType from Existing StructType — apply Method

apply(names: Set[String]): StructType

This variant of apply lets you create a StructType out of an existing StructType with the names only.

scala> schemaTyped(names = Set("a"))
res0: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))

It will throw an IllegalArgumentException exception when a field could not be found.

scala> schemaTyped(names = Set("a", "c"))
java.lang.IllegalArgumentException: Field c does not exist.
  at org.apache.spark.sql.types.StructType.apply(StructType.scala:275)
  ... 48 elided

Displaying Schema As Tree — printTreeString Method

printTreeString(): Unit

printTreeString prints out the schema to standard output.

scala> schemaTyped.printTreeString
root
 |-- a: integer (nullable = true)
 |-- b: string (nullable = true)

Internally, it uses treeString method to build the tree and then println it.

Creating StructType For DDL-Formatted Text — fromDDL Object Method

fromDDL(ddl: String): StructType

fromDDL…​FIXME

Note
fromDDL is used when…​FIXME

Converting to DDL Format — toDDL Method

toDDL: String

toDDL converts all the fields to DDL format and concatenates them using the comma (,).

results matching ""

    No results matching ""