python - Comparing two text files to remove duplication of the longer one -
python - Comparing two text files to remove duplication of the longer one -
i have 2 files 1 contains list of info delimited tab , sec 1 includes list of items' id 1 field. compare each first field in larger file (file1) lines/item id in smallest file(file2),then if compared id not exist in sec file want write info related compared item in first file(which line content separated tab). tried below code have problem loops. first loop doesn't increment while sec loops sec file lines. also, want item number written 1 time problem in if statement.
for lines in alldata: lines1 in olddata: old_data=lines1.split('\r\n') dataid=old_data[0] data=lines.split('\t') photoid=data[0] if photoid==dataid: break else: #continue #print('matching',lines) #break w=open(head+'......................../1.txt','a') w.write(lines)
this sample of files structure:
15463774518 2014-10-28 08:12:31 2014-10-28 13:12:31 15628560471 2014-10-26 07:40:28 2014-10-26 12:40:28 15444098878 2014-10-26 04:49:19 2014-10-26 09:49:19 15437269197 2014-10-25 09:55:11 2014-10-25 15:55:11
the little file looks like:
139747955 2417570005 2478707302 1808883457 211514265
i suggest next pseudo code. pythonic way of checking if id file 1 in id file 2 having list of ids file 2 , if id in idlist:
.
let's have read ids of sec file list idlist_file2
. read file one. parse line , check
with open(file1,'r') f: line in f: info = parse_line(line) # function according info format, either homecoming dict or tuple ever works best if date['id'] not in idlist_file2: do_something_with_this_info(data)
this should give starting point.
python file comparison
Comments
Post a Comment