python 3 open and read url without url name -
python 3 open and read url without url name -
i have gone through relevant questions, , did not find reply one:
i want open url , parse contents.
when on, say, google.com, no problem.
when on url not have file name, told read empty string.
see code below example:
import urllib.request #urls = ["http://www.google.com", "http://www.whoscored.com", "http://www.whoscored.com/livescores"] #urls = ["http://www.whoscored.com", "http://www.whoscored.com/livescores"] urls = ["http://www.whoscored.com/livescores"] print("type of urls: {0}.".format(str(type(urls)))) url in urls: print("\n\n\n\n---------------------------------------------\n\nurl is: {0}.".format(url)) sock=urllib.request.urlopen(url) print("i have sock: {0}.".format(sock)) htmlsource = sock.read() print("i read source code...") htmlsourceline = sock.readlines() sock.close() htmlsourcestring = str(htmlsource) print("\n\ntype of htmlsourcestring: " + str(type(htmlsourcestring))) htmlsourcestring = htmlsourcestring.replace(">", ">\n") htmlsourcestring = htmlsourcestring.replace("\\r\\n", "\n") print(htmlsourcestring) print("\n\ni done url: {0}.".format(url))
i not know empty string homecoming urls don't have file name--such "www.whoscored.com/livescores" in example--whereas "google.com" or "www.whoscored.com" seem work time.
i hope formulation understandable...
it looks site coded explicitly reject requests non-browser clients. you'll have spoof creating sessions , like, ensuring cookies passed , forth required. third-party requests library can help these tasks, bottom line going have find out more how site operates.
python python-3.x web-scraping
Comments
Post a Comment