StateStoreAwareZipPartitionsRDD

StateStoreAwareZipPartitionsRDD is a ZippedPartitionsRDD2 with the left and right parent RDDs.

StateStoreAwareZipPartitionsRDD is created exclusively when StreamingSymmetricHashJoinExec physical operator is requested to execute and generate a recipe for a distributed computation (as an RDD[InternalRow]) (and requests StateStoreAwareZipPartitionsHelper for one).

Creating StateStoreAwareZipPartitionsRDD Instance

StateStoreAwareZipPartitionsRDD takes the following to be created:

Placement Preferences of Partition (Preferred Locations) — getPreferredLocations Method

getPreferredLocations(partition: Partition): Seq[String]
Note
getPreferredLocations is a part of the RDD Contract to specify placement preferences (aka preferred task locations), i.e. where tasks should be executed to be as close to the data as possible.

getPreferredLocations simply requests the StateStoreCoordinatorRef for the location of every state store (with the StatefulOperatorStateInfo and the partition ID) and returns unique executor IDs (so that processing a partition happens on the executor with the proper state store for the operator and the partition).

results matching ""

    No results matching ""