apache spark - How to convert Scala RDD to Map -
apache spark - How to convert Scala RDD to Map -
i have rdd (array of string) org.apache.spark.rdd.rdd[string] = mappedrdd[18] , convert map uniqueids did 'val vertexmap = vertices.zipwithuniqueid' gave me rdd of type 'org.apache.spark.rdd.rdd[(string, long)]' want 'map[string,long]' . how can convert 'org.apache.spark.rdd.rdd[(string, long)] map[string,long]' ?
thanks
there's built-in collectasmap function in pairrddfunctions deliver map of pair values in rdd.
val vertexmap = vertices.zipwithuniqueid.collectasmap it's of import remember rdd distributed info structure. can visualize 'pieces' of info spread on cluster. when collect, forcefulness pieces go driver , able that, need fit in memory of driver.
from comments, looks in case, need deal big dataset. making map out of not going work won't fit on driver's memory; causing oom exceptions if try.
you need maintain dataset rdd. if creating map in order lookup elements, utilize lookup on pairrdd instead, this:
import org.apache.spark.sparkcontext._ // import implicits conversions back upwards pairrddfunctions val vertexmap = vertices.zipwithuniqueid val vertixyid = vertexmap.lookup("vertexy") scala apache-spark
Comments
Post a Comment