apache spark - How to convert Scala RDD to Map -
apache spark - How to convert Scala RDD to Map -
i have rdd (array of string) org.apache.spark.rdd.rdd[string] = mappedrdd[18]
, convert map uniqueids did 'val vertexmap = vertices.zipwithuniqueid
' gave me rdd of type 'org.apache.spark.rdd.rdd[(string, long)]'
want 'map[string,long]
' . how can convert 'org.apache.spark.rdd.rdd[(string, long)] map[string,long]
' ?
thanks
there's built-in collectasmap
function in pairrddfunctions
deliver map of pair values in rdd.
val vertexmap = vertices.zipwithuniqueid.collectasmap
it's of import remember rdd distributed info structure. can visualize 'pieces' of info spread on cluster. when collect
, forcefulness pieces go driver , able that, need fit in memory of driver.
from comments, looks in case, need deal big dataset. making map out of not going work won't fit on driver's memory; causing oom exceptions if try.
you need maintain dataset rdd. if creating map in order lookup elements, utilize lookup
on pairrdd instead, this:
import org.apache.spark.sparkcontext._ // import implicits conversions back upwards pairrddfunctions val vertexmap = vertices.zipwithuniqueid val vertixyid = vertexmap.lookup("vertexy")
scala apache-spark
Comments
Post a Comment