python - Delete every non utf-8 symbols froms string -



python - Delete every non utf-8 symbols froms string -

i have big amount of files , parser. have strip non utf-8 symbols , set info in mongodb. have code this.

with open(fname, "r") fp: line in fp: line = line.strip() line = line.decode('utf-8', 'ignore') line = line.encode('utf-8', 'ignore')

somehow still error

bson.errors.invalidstringdata: strings in documents must valid utf-8: 1/b62010montecassianomcir\xe2\x86\x90ta0\xe2\x86\x90008923304320733/290066010401040101506055soccorin

i don't it. there simple way it?

upd: seems python , mongo don't agree definition of utf-8 valid string.

try below code line instead of lastly 2 lines. hope helps:

line=line.decode('utf-8','ignore').encode("utf-8")

python mongodb encode

Comments

Popular posts from this blog

php - How to pass multiple values from url -

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

database - php search bar when I press submit with nothing in the search bar it shows all the data -