python 3 open and read url without url name -

February 15, 2012

i have gone through relevant questions, , did not find reply one:

i want open url , parse contents.

when on, say, google.com, no problem.

when on url not have file name, told read empty string.

see code below example:

import urllib.request  #urls = ["http://www.google.com", "http://www.whoscored.com", "http://www.whoscored.com/livescores"] #urls = ["http://www.whoscored.com", "http://www.whoscored.com/livescores"] urls = ["http://www.whoscored.com/livescores"] print("type of urls: {0}.".format(str(type(urls)))) url in urls:     print("\n\n\n\n---------------------------------------------\n\nurl is: {0}.".format(url))     sock=urllib.request.urlopen(url)     print("i have sock: {0}.".format(sock))     htmlsource = sock.read()     print("i read source code...")     htmlsourceline = sock.readlines()     sock.close()     htmlsourcestring = str(htmlsource)     print("\n\ntype of htmlsourcestring: " + str(type(htmlsourcestring)))     htmlsourcestring = htmlsourcestring.replace(">", ">\n")     htmlsourcestring = htmlsourcestring.replace("\\r\\n", "\n")     print(htmlsourcestring)     print("\n\ni done url: {0}.".format(url))

i not know empty string homecoming urls don't have file name--such "www.whoscored.com/livescores" in example--whereas "google.com" or "www.whoscored.com" seem work time.

i hope formulation understandable...

it looks site coded explicitly reject requests non-browser clients. you'll have spoof creating sessions , like, ensuring cookies passed , forth required. third-party requests library can help these tasks, bottom line going have find out more how site operates.

python python-3.x web-scraping

Search This Blog

New Th

python 3 open and read url without url name -

Comments

Post a Comment

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

Local Service User Logged into Windows -