java - How to print first and last line of file in hadoop? -

September 15, 2011

what best way print first line , lastly line of input file using hadoop map cut down ?

for illustration if have file of 10 gb , typical block size 128 mb approximately 80 mappers invoked keeping default configuration means not manipulating split size

so 80 mappers invoked how differentiate how framework has assigned split size means starting split size offset or number mapper.

so can't set logic in map function blindly way applied other mappers .

one solution can think of using 1 mapper keeping block size of file size way can set functionality in map function way won't able create utilize of parallel computing .

any effective way of doing ?

can seek "hadoop fs" commands separately store first , lastly line , run map cut down jobs on it. hadoop has specific tail command straight gives lastly n lines in file.

this tried:

file size: 2.2mb

first line: getting first straight forward, cat , take head -n1 hadoop fs -cat $file | head -n1 time taken: 4s

last line: there 2 ways this, 1 cat , tail. since file size if big long.

hadoop fs -cat $file | tail -n1

time taken: 39 seconds

but luckily there tail command comes rescue here. can hadoo fs -tail on file , time taken same head command. per documentation : displays lastly kilobyte of file stdout. -f alternative can used in unix

hadoop fs -tail | $file | tail -n1

time taken: 4 seconds

you can seek on file , check time difference.

java hadoop mapreduce

Search This Blog

New Th

java - How to print first and last line of file in hadoop? -

Comments

Post a Comment

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

SQL Server : need assitance parsing delimted data and returning a long concatenated string -