import org.apache.spark.sql.types.StructType
val schemaUntyped = new StructType()
.add("a", "int")
.add("b", "string")
import org.apache.spark.sql.types.{IntegerType, StringType}
val schemaTyped = new StructType()
.add("a", IntegerType)
.add("b", StringType)
scala> schemaUntyped == schemaTyped
res0: Boolean = true
StructType — Data Type for Schema Definition
StructType
is a built-in data type that is a collection of StructFields.
StructType
is used to define a schema or its part.
You can compare two StructType
instances to see whether they are equal.
StructType
presents itself as <struct>
or STRUCT
in query plans or SQL.
Note
|
Read the official documentation of Scala’s scala.collection.Seq. |
As of Spark 2.4.0, StructType
can be converted to DDL format using toDDL method.
// Generating a schema from a case class
// Because we're all properly lazy
case class Person(id: Long, name: String)
import org.apache.spark.sql.Encoders
val schema = Encoders.product[Person].schema
scala> println(schema.toDDL)
`id` BIGINT,`name` STRING
fromAttributes
Method
fromAttributes(attributes: Seq[Attribute]): StructType
fromAttributes
…FIXME
Note
|
fromAttributes is used when…FIXME
|
toAttributes
Method
toAttributes: Seq[AttributeReference]
toAttributes
…FIXME
Note
|
toAttributes is used when…FIXME
|
Adding Fields to Schema — add
Method
You can add a new StructField
to your StructType
. There are different variants of add
method that all make for a new StructType
with the field added.
add(field: StructField): StructType
add(name: String, dataType: DataType): StructType
add(name: String, dataType: DataType, nullable: Boolean): StructType
add(
name: String,
dataType: DataType,
nullable: Boolean,
metadata: Metadata): StructType
add(
name: String,
dataType: DataType,
nullable: Boolean,
comment: String): StructType
add(name: String, dataType: String): StructType
add(name: String, dataType: String, nullable: Boolean): StructType
add(
name: String,
dataType: String,
nullable: Boolean,
metadata: Metadata): StructType
add(
name: String,
dataType: String,
nullable: Boolean,
comment: String): StructType
DataType Name Conversions
simpleString: String
catalogString: String
sql: String
StructType
as a custom DataType
is used in query plans or SQL. It can present itself using simpleString
, catalogString
or sql
(see DataType Contract).
scala> schemaTyped.simpleString
res0: String = struct<a:int,b:string>
scala> schemaTyped.catalogString
res1: String = struct<a:int,b:string>
scala> schemaTyped.sql
res2: String = STRUCT<`a`: INT, `b`: STRING>
Accessing StructField — apply
Method
apply(name: String): StructField
StructType
defines its own apply
method that gives you an easy access to a StructField
by name.
scala> schemaTyped.printTreeString
root
|-- a: integer (nullable = true)
|-- b: string (nullable = true)
scala> schemaTyped("a")
res4: org.apache.spark.sql.types.StructField = StructField(a,IntegerType,true)
Creating StructType from Existing StructType — apply
Method
apply(names: Set[String]): StructType
This variant of apply
lets you create a StructType
out of an existing StructType
with the names
only.
scala> schemaTyped(names = Set("a"))
res0: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))
It will throw an IllegalArgumentException
exception when a field could not be found.
scala> schemaTyped(names = Set("a", "c"))
java.lang.IllegalArgumentException: Field c does not exist.
at org.apache.spark.sql.types.StructType.apply(StructType.scala:275)
... 48 elided
Displaying Schema As Tree — printTreeString
Method
printTreeString(): Unit
printTreeString
prints out the schema to standard output.
scala> schemaTyped.printTreeString
root
|-- a: integer (nullable = true)
|-- b: string (nullable = true)
Internally, it uses treeString
method to build the tree and then println
it.
Creating StructType For DDL-Formatted Text — fromDDL
Object Method
fromDDL(ddl: String): StructType
fromDDL
…FIXME
Note
|
fromDDL is used when…FIXME
|
Converting to DDL Format — toDDL
Method
toDDL: String
toDDL
converts all the fields to DDL format and concatenates them using the comma (,
).