scala - Play Framework Ning WS API encoding issue with HTML pages -
scala - Play Framework Ning WS API encoding issue with HTML pages -
i'm using play framework 2.3 , ws api download , parse html pages. none-english pages (e.g russian, hebrew), wrong encoding.
here's example:
def test = action.async { request => ws.url("http://news.walla.co.il/item/2793388").get.map { response => ok(response.body) } }
this returns web page's html. english language characters received ok. hebrew letters appear gibberish. (not when rendering, @ internal string level). so:
<title>29 ×ר×××× ××פ××ת ×ש×××× ×× ×¤××, ××× ×©×××©× ×שר×××× - ×××××! ××ש×ת</title>
other articles same web-site can appear ok.
using curl
same web-page returns fine makes me believe problem within ws api.
any ideas?
edit:
i found solution in question.
parsing response iso-8859-1
, converting utf-8
like-so:
ok(new string(response.body.getbytes("iso-8859-1") , response.header(content_encoding).getorelse("utf-8")))
display correctly. have working solution, why isn't done internally?
ok, here solution ended using in production:
def responsebody = response.header(content_type).filter(_.tolowercase.contains("charset")).fold(new string(response.body.getbytes("iso-8859-1") , "utf-8"))(_ => response.body)
explanation:
if request returns "content-type" header specifies charset, homecoming response body sine ws api utilize decode correctly, otherwise, assume response iso-8859-1 encoded , convert utf-8
scala http character-encoding playframework-2.0
Comments
Post a Comment