python - Delete every non utf-8 symbols froms string -
python - Delete every non utf-8 symbols froms string -
i have big amount of files , parser. have strip non utf-8 symbols , set info in mongodb. have code this.
with open(fname, "r") fp: line in fp: line = line.strip() line = line.decode('utf-8', 'ignore') line = line.encode('utf-8', 'ignore')
somehow still error
bson.errors.invalidstringdata: strings in documents must valid utf-8: 1/b62010montecassianomcir\xe2\x86\x90ta0\xe2\x86\x90008923304320733/290066010401040101506055soccorin
i don't it. there simple way it?
upd: seems python , mongo don't agree definition of utf-8 valid string.
try below code line instead of lastly 2 lines. hope helps:
line=line.decode('utf-8','ignore').encode("utf-8")
python mongodb encode
Comments
Post a Comment