getPreferredLocations(partition: Partition): Seq[String]
StateStoreAwareZipPartitionsRDD
StateStoreAwareZipPartitionsRDD
is created exclusively when StreamingSymmetricHashJoinExec
physical operator is requested to execute and generate a recipe for a distributed computation (as an RDD[InternalRow]) (and requests StateStoreAwareZipPartitionsHelper for one).
Creating StateStoreAwareZipPartitionsRDD Instance
StateStoreAwareZipPartitionsRDD
takes the following to be created:
-
Function (
(Iterator[A], Iterator[B]) ⇒ Iterator[V]
, e.g. processPartitions) -
Names of the state stores
Placement Preferences of Partition (Preferred Locations) — getPreferredLocations
Method
Note
|
getPreferredLocations is a part of the RDD Contract to specify placement preferences (aka preferred task locations), i.e. where tasks should be executed to be as close to the data as possible.
|
getPreferredLocations
simply requests the StateStoreCoordinatorRef for the location of every state store (with the StatefulOperatorStateInfo and the partition ID) and returns unique executor IDs (so that processing a partition happens on the executor with the proper state store for the operator and the partition).