Data Sources in Spark

Spark can access data from many data sources, including Hadoop Distributed File System (HDFS), Cassandra, HBase, S3 and many more.

Spark offers different APIs to read data based upon the content and the storage.

There are two groups of data based upon the content:

  • binary

  • text

You can also group data by the storage:

