import org.apache.spark.sql.functions.udf
val lengthUDF = udf { s: String => s.length }
scala> :type lengthUDF
org.apache.spark.sql.expressions.UserDefinedFunction
val r = lengthUDF($"name")
scala> :type r
org.apache.spark.sql.Column
UserDefinedFunction
UserDefinedFunction represents a user-defined function.
UserDefinedFunction is created when:
-
udf function is executed
-
UDFRegistrationis requested to register a Scala function as a user-defined function (inFunctionRegistry)
UserDefinedFunction can have an optional name.
val namedLengthUDF = lengthUDF.withName("lengthUDF")
scala> namedLengthUDF($"name")
res2: org.apache.spark.sql.Column = UDF:lengthUDF(name)
UserDefinedFunction is nullable by default, but can be changed as non-nullable.
val nonNullableLengthUDF = lengthUDF.asNonNullable
assert(nonNullableLengthUDF.nullable == false)
UserDefinedFunction is deterministic by default, i.e. produces the same result for the same input. UserDefinedFunction can be changed to be non-deterministic.
assert(lengthUDF.deterministic)
val ndUDF = lengthUDF.asNondeterministic
assert(ndUDF.deterministic == false)
| Name | Description |
|---|---|
|
Flag that controls whether the function is deterministic ( Default:
Used when |
Executing UserDefinedFunction (Creating Column with ScalaUDF Expression) — apply Method
apply(exprs: Column*): Column
import org.apache.spark.sql.functions.udf
scala> val lengthUDF = udf { s: String => s.length }
lengthUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,Some(List(StringType)))
scala> lengthUDF($"name")
res1: org.apache.spark.sql.Column = UDF(name)
|
Note
|
apply is used when…FIXME
|
Marking UserDefinedFunction as NonNullable — asNonNullable Method
asNonNullable(): UserDefinedFunction
asNonNullable…FIXME
|
Note
|
asNonNullable is used when…FIXME
|
Naming UserDefinedFunction — withName Method
withName(name: String): UserDefinedFunction
withName…FIXME
|
Note
|
withName is used when…FIXME
|
Creating UserDefinedFunction Instance
UserDefinedFunction takes the following when created:
-
Output data type
-
Input data types (if available)
UserDefinedFunction initializes the internal registries and counters.