implicits Object — Implicits Conversions

implicits object gives implicit conversions for converting Scala objects (incl. RDDs) into a Dataset, DataFrame, Columns or supporting such conversions (through Encoders).

Table 1. implicits API
Name Description


Creates a DatasetHolder with the input Seq[T] converted to a Dataset[T] (using SparkSession.createDataset).

implicit def localSeqToDatasetHolder[T : Encoder](s: Seq[T]): DatasetHolder[T]


Encoders for primitive and object types in Scala and Java (aka boxed types)


Converts $"name" into a Column

implicit class StringToColumn(val sc: StringContext)


implicit def rddToDatasetHolder[T : Encoder](rdd: RDD[T]): DatasetHolder[T]


implicit def symbolToColumn(s: Symbol): ColumnName

implicits object is defined inside SparkSession and hence requires that you build a SparkSession instance first before importing implicits conversions.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...
import spark.implicits._

scala> val ds = Seq("I am a shiny Dataset!").toDS
ds: org.apache.spark.sql.Dataset[String] = [value: string]

scala> val df = Seq("I am an old grumpy DataFrame!").toDF
df: org.apache.spark.sql.DataFrame = [value: string]

scala> val df = Seq("I am an old grumpy DataFrame with text column!").toDF("text")
df: org.apache.spark.sql.DataFrame = [text: string]

val rdd = sc.parallelize(Seq("hello, I'm a very low-level RDD"))
scala> val ds = rdd.toDS
ds: org.apache.spark.sql.Dataset[String] = [value: string]

In Scala REPL-based environments, e.g. spark-shell, use :imports to know what imports are in scope.

scala> :help imports

show import history, identifying sources of names

scala> :imports
 1) import org.apache.spark.SparkContext._ (69 terms, 1 are implicit)
 2) import spark.implicits._       (1 types, 67 terms, 37 are implicit)
 3) import spark.sql               (1 terms)
 4) import org.apache.spark.sql.functions._ (354 terms)

implicits object extends SQLImplicits abstract class.

DatasetHolder Scala Case Class

DatasetHolder is a Scala case class that, when created, takes a Dataset[T].

DatasetHolder is created (implicitly) when rddToDatasetHolder and localSeqToDatasetHolder implicit conversions are used.

DatasetHolder has toDS and toDF methods that simply return the Dataset[T] (it was created with) or a DataFrame (using Dataset.toDF operator), respectively.

toDS(): Dataset[T]
toDF(): DataFrame
toDF(colNames: String*): DataFrame

results matching ""

    No results matching ""