Datasets vs DataFrames vs RDDs
Many may have been asking yourself why they should be using Datasets rather than the foundation of all Spark - RDDs using case classes.
This document collects advantages of
RDD[CaseClass] to answer the question Dan has asked on twitter:
"In #Spark, what is the advantage of a DataSet over an RDD[CaseClass]?"
Saving to or Writing from Data Sources
In Datasets, reading or writing boils down to using
SQLContext.write methods, appropriately.
Accessing Fields / Columns
select columns in a datasets without worrying about the positions of the columns.
In RDD, you have to do an additional hop over a case class and access fields by name.