python - Delete every non utf-8 symbols froms string -



python - Delete every non utf-8 symbols froms string -

i have big amount of files , parser. have strip non utf-8 symbols , set info in mongodb. have code this.

with open(fname, "r") fp: line in fp: line = line.strip() line = line.decode('utf-8', 'ignore') line = line.encode('utf-8', 'ignore')

somehow still error

bson.errors.invalidstringdata: strings in documents must valid utf-8: 1/b62010montecassianomcir\xe2\x86\x90ta0\xe2\x86\x90008923304320733/290066010401040101506055soccorin

i don't it. there simple way it?

upd: seems python , mongo don't agree definition of utf-8 valid string.

try below code line instead of lastly 2 lines. hope helps:

line=line.decode('utf-8','ignore').encode("utf-8")

python mongodb encode

Comments

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

Local Service User Logged into Windows -