sc.parallelize(0 to 999, 50).zipWithIndex.groupBy(_._1 / 10).collect
Learning Spark internals using groupBy (to cause shuffle)
Execute the following operation and explain transformations, actions, jobs, stages, tasks, partitions using
spark-shell and web UI.
You may also make it a little bit heavier with explaining data distribution over cluster and go over the concepts of drivers, masters, workers, and executors.