python 3 open and read url without url name -



python 3 open and read url without url name -

i have gone through relevant questions, , did not find reply one:

i want open url , parse contents.

when on, say, google.com, no problem.

when on url not have file name, told read empty string.

see code below example:

import urllib.request #urls = ["http://www.google.com", "http://www.whoscored.com", "http://www.whoscored.com/livescores"] #urls = ["http://www.whoscored.com", "http://www.whoscored.com/livescores"] urls = ["http://www.whoscored.com/livescores"] print("type of urls: {0}.".format(str(type(urls)))) url in urls: print("\n\n\n\n---------------------------------------------\n\nurl is: {0}.".format(url)) sock=urllib.request.urlopen(url) print("i have sock: {0}.".format(sock)) htmlsource = sock.read() print("i read source code...") htmlsourceline = sock.readlines() sock.close() htmlsourcestring = str(htmlsource) print("\n\ntype of htmlsourcestring: " + str(type(htmlsourcestring))) htmlsourcestring = htmlsourcestring.replace(">", ">\n") htmlsourcestring = htmlsourcestring.replace("\\r\\n", "\n") print(htmlsourcestring) print("\n\ni done url: {0}.".format(url))

i not know empty string homecoming urls don't have file name--such "www.whoscored.com/livescores" in example--whereas "google.com" or "www.whoscored.com" seem work time.

i hope formulation understandable...

it looks site coded explicitly reject requests non-browser clients. you'll have spoof creating sessions , like, ensuring cookies passed , forth required. third-party requests library can help these tasks, bottom line going have find out more how site operates.

python python-3.x web-scraping

Comments

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

Local Service User Logged into Windows -