scala - Play Framework Ning WS API encoding issue with HTML pages -



scala - Play Framework Ning WS API encoding issue with HTML pages -

i'm using play framework 2.3 , ws api download , parse html pages. none-english pages (e.g russian, hebrew), wrong encoding.

here's example:

def test = action.async { request => ws.url("http://news.walla.co.il/item/2793388").get.map { response => ok(response.body) } }

this returns web page's html. english language characters received ok. hebrew letters appear gibberish. (not when rendering, @ internal string level). so:

<title>29 ×ר×××× ××פ××ת ×ש×××× ×× ×¤××, ××× ×©×××©× ×שר×××× - ×××××! ××ש×ת</title>

other articles same web-site can appear ok.

using curl same web-page returns fine makes me believe problem within ws api.

any ideas?

edit:

i found solution in question.

parsing response iso-8859-1 , converting utf-8 like-so:

ok(new string(response.body.getbytes("iso-8859-1") , response.header(content_encoding).getorelse("utf-8")))

display correctly. have working solution, why isn't done internally?

ok, here solution ended using in production:

def responsebody = response.header(content_type).filter(_.tolowercase.contains("charset")).fold(new string(response.body.getbytes("iso-8859-1") , "utf-8"))(_ => response.body)

explanation:

if request returns "content-type" header specifies charset, homecoming response body sine ws api utilize decode correctly, otherwise, assume response iso-8859-1 encoded , convert utf-8

scala http character-encoding playframework-2.0

Comments

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

Local Service User Logged into Windows -