python - Cache Proxy Server Returning 404 with www.google.com -
i have homework assignment involves implementing proxy cache server in python web pages. here implementation of it
from socket import * import sys def main(): #create server socket, bind port , start listening tcpsersock = socket(af_inet, sock_stream) #initializing socket tcpsersock.bind(("", 8030)) #binding socket port tcpsersock.listen(5) #listening page requests while true: #start receiving data client print 'ready serve...' tcpclisock, addr = tcpsersock.accept() print 'received connection from:', addr message = tcpclisock.recv(1024) print message #extract filename given message filename = "" try: filename = message.split()[1].partition("/")[2].replace("/", "") except: continue fileexist = false try: #check whether file exists in cache f = open(filename, "r") outputdata = f.readlines() fileexist = true #proxyserver finds cache hit , generates response message tcpclisock.send("http/1.0 200 ok\r\n") tcpclisock.send("content-type:text/html\r\n") data in outputdata: tcpclisock.send(data) print 'read cache' except ioerror: #error handling file not found in cache if fileexist == false: c = socket(af_inet, sock_stream) #create socket on proxyserver try: srv = getaddrinfo(filename, 80) c.connect((filename, 80)) #https://docs.python.org/2/library/socket.html # create temporary file on socket , ask port 80 # file requested client fileobj = c.makefile('r', 0) fileobj.write("get " + "http://" + filename + " http/1.0\r\n") # read response buffer buffr = fileobj.readlines() # create new file in cache requested file. # send response in buffer client socket , # corresponding file in cache tmpfile = open(filename,"wb") data in buffr: tmpfile.write(data) tcpclisock.send(data) except: print "illegal request" else: #file not found print "404: file not found" tcpclisock.close() #close client , server sockets main()
i configured browsers use proxy server so
but problem when run no matter web page try access returns 404 error initial connection , connection reset error subsequent connections. have no idea why appreciated, thanks!
there quite number of issues code.
your url parser quite cumbersome. instead of line
filename = message.split()[1].partition("/")[2].replace("/", "")
i use
import re parsed_url = re.match(r'get\s+http://(([^/]+)(.*))\shttp/1.*$', message) local_path = parsed_url.group(3) host_name = parsed_url.group(2) filename = parsed_url.group(1)
if catch exception there, should throw error because request proxy doesn't understand (e.g. post).
when assemble request destination server, use
fileobj.write("get {object} http/1.0\n".format(object=local_path)) fileobj.write("host: {host}\n\n".format(host=host_name))
you should include of header lines original request because can make major difference returned content.
furthermore, cache entire response header lines, should not add own when serving cache.
what have doesn't work, anyway, because there no guarantee 200 , text/html
content. should check response code , cache if did indeed 200.
Comments
Post a Comment