import org.apache.spark.sql.functions.udf
val lengthUDF = udf { s: String => s.length }
scala> :type lengthUDF
org.apache.spark.sql.expressions.UserDefinedFunction
val r = lengthUDF($"name")
scala> :type r
org.apache.spark.sql.Column
UserDefinedFunction
UserDefinedFunction
represents a user-defined function.
UserDefinedFunction
is created when:
-
udf function is executed
-
UDFRegistration
is requested to register a Scala function as a user-defined function (inFunctionRegistry
)
UserDefinedFunction
can have an optional name.
val namedLengthUDF = lengthUDF.withName("lengthUDF")
scala> namedLengthUDF($"name")
res2: org.apache.spark.sql.Column = UDF:lengthUDF(name)
UserDefinedFunction
is nullable by default, but can be changed as non-nullable.
val nonNullableLengthUDF = lengthUDF.asNonNullable
assert(nonNullableLengthUDF.nullable == false)
UserDefinedFunction
is deterministic by default, i.e. produces the same result for the same input. UserDefinedFunction
can be changed to be non-deterministic.
assert(lengthUDF.deterministic)
val ndUDF = lengthUDF.asNondeterministic
assert(ndUDF.deterministic == false)
Name | Description |
---|---|
|
Flag that controls whether the function is deterministic ( Default:
Used when |
Executing UserDefinedFunction (Creating Column with ScalaUDF Expression) — apply
Method
apply(exprs: Column*): Column
import org.apache.spark.sql.functions.udf
scala> val lengthUDF = udf { s: String => s.length }
lengthUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,IntegerType,Some(List(StringType)))
scala> lengthUDF($"name")
res1: org.apache.spark.sql.Column = UDF(name)
Note
|
apply is used when…FIXME
|
Marking UserDefinedFunction as NonNullable — asNonNullable
Method
asNonNullable(): UserDefinedFunction
asNonNullable
…FIXME
Note
|
asNonNullable is used when…FIXME
|
Naming UserDefinedFunction — withName
Method
withName(name: String): UserDefinedFunction
withName
…FIXME
Note
|
withName is used when…FIXME
|
Creating UserDefinedFunction Instance
UserDefinedFunction
takes the following when created:
-
Output data type
-
Input data types (if available)
UserDefinedFunction
initializes the internal registries and counters.