apache spark - How to convert Scala RDD to Map -



apache spark - How to convert Scala RDD to Map -

i have rdd (array of string) org.apache.spark.rdd.rdd[string] = mappedrdd[18] , convert map uniqueids did 'val vertexmap = vertices.zipwithuniqueid' gave me rdd of type 'org.apache.spark.rdd.rdd[(string, long)]' want 'map[string,long]' . how can convert 'org.apache.spark.rdd.rdd[(string, long)] map[string,long]' ?

thanks

there's built-in collectasmap function in pairrddfunctions deliver map of pair values in rdd.

val vertexmap = vertices.zipwithuniqueid.collectasmap

it's of import remember rdd distributed info structure. can visualize 'pieces' of info spread on cluster. when collect, forcefulness pieces go driver , able that, need fit in memory of driver.

from comments, looks in case, need deal big dataset. making map out of not going work won't fit on driver's memory; causing oom exceptions if try.

you need maintain dataset rdd. if creating map in order lookup elements, utilize lookup on pairrdd instead, this:

import org.apache.spark.sparkcontext._ // import implicits conversions back upwards pairrddfunctions val vertexmap = vertices.zipwithuniqueid val vertixyid = vertexmap.lookup("vertexy")

scala apache-spark

Comments

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

Local Service User Logged into Windows -