java - How to print first and last line of file in hadoop? -
java - How to print first and last line of file in hadoop? -
what best way print first line , lastly line of input file using hadoop map cut down ?
for illustration if have file of 10 gb , typical block size 128 mb approximately 80 mappers invoked keeping default configuration means not manipulating split size
so 80 mappers invoked how differentiate how framework has assigned split size means starting split size offset or number mapper.
so can't set logic in map function blindly way applied other mappers .
one solution can think of using 1 mapper keeping block size of file size way can set functionality in map function way won't able create utilize of parallel computing .
any effective way of doing ?
can seek "hadoop fs" commands separately store first , lastly line , run map cut down jobs on it. hadoop has specific tail command straight gives lastly n lines in file.
this tried:
file size: 2.2mb
first line: getting first straight forward, cat , take head -n1 hadoop fs -cat $file | head -n1 time taken: 4s
last line: there 2 ways this, 1 cat , tail. since file size if big long.
hadoop fs -cat $file | tail -n1
time taken: 39 seconds
but luckily there tail command comes rescue here. can hadoo fs -tail on file , time taken same head command. per documentation : displays lastly kilobyte of file stdout. -f alternative can used in unix
hadoop fs -tail | $file | tail -n1
time taken: 4 seconds
you can seek on file , check time difference.
java hadoop mapreduce
Comments
Post a Comment