In this post I am going to discuss the python-requests library. Python-requests is a powerful HTTP library that helps you make HTTP(s) requests very easily by writing minimal amount of code and also allows Basic HTTP Authentication out of the box. But before I write this post I want to describe the motivation behind me writing this post.
When it comes to writing software, libraries are a lifesaver. There is a library that addresses almost every problem you need to solve. That was the case for me as well. Whenever I used to face a specific problem I would look to see, if a library already existed. But I never tried to understand how they were implemented, the hard work that goes into building them, or the folks behind the libraries. Most of the libraries we use these days are open source and their source code is available somewhere. So we could, if we wished to, with a little hard work, understand the implementation.
During a related discussion with mbuf in #dgplug channel, he gave me a assignment to understand one of the libraries I have recently used and understand what data structures/algorithms are used. So I chose to look inside the source code of python-requests . Let’s begin by understanding how two nodes in a network actually communicate.
Socket Programming : The basis of all Networking Applications
Socket Programming is a way of connecting two nodes in a network and letting them communicate with each other. Usually, one node acts a server and other as a client. The server node listens to a port for an IP, while the client reaches out to make a connection. The combination of port and an IP is called a socket. The listener socket in the server listens to request from the client.
This is the basis of all Web Browsing that happens on the Internet. Let us see how a basic client-server socket program looks like
As you can see a server binds to a port where it listens to any incoming request. In our case it is listening to all network interfaces 0.0.0.0 (which is represented by an empty string) at a random port 12345. For a HTTP Server the default port is 80. The server accepts any incoming request from a client and then sends a response and closes the connection.
When a client wants to connect to a server it connects to the port the server is listening on, and sends in the request. In this case we send the request to 127.0.0.1 which is the IP of the local computer known as localhost.
This is how any client-server communication would look like. But there is obviously lot more to it. There will be more than one request coming to a server so we will need multi-threaded server to handle it. In this case I sent simple text. But there could be different types of data like images, files etc.
Most of the communication that happens over the web uses HTTP which is a protocol to handle exchange and transfer of hypertext i.e. the output of the web pages we visit. Then there is HTTPS which is the secure version of HTTP which encrypts the communication happening over the network using protocols like TLS.
Making HTTP Requests in Python
Handling HTTP/HTTPS requests in an application can be complex and so we have libraries in every programming language that make our life easier. In Python there are quite a few libraries that can be used for working with HTTP. The most basic is the http.client which is a cpython library. The http.client uses socket programs that is used to make the request. Here’s how we make a HTTP request using http.client
For making Requests that involve Authentication we have to use Authorization headers in the request header. We have used the base64 library here for generating a Base64 encoded Authorization String.
Using python-requests for making HTTP requests
The http.client library is a very basic library for making HTTP requests and its not used directly for making complex HTTP requests. Requests is a library that wraps around http.client and gives us a really friendly interface to handle all kinds of http(s) requests, simple or complex and takes care of lots of other nitty gritty, e.g., TLS security for HTTPS requests.
Requests heavily depends on urllib3 library which in turn uses the http.client library. This sample shows how requests is used for making HTTP requests
You can see making requests is much simpler using requests module. Also it gracefully handles which protocol to use by parsing the URL of the request
Let us now go over the implementation
The requests api contains method names similar to the type of request. So there is get, post, put, patch, delete, head methods.
Given below is a rough UML class diagram of the most important classes of the requests library
When we make a request using the request api the following things happen
1. Call to Session.request() method
Whenever we make a request using the requests api it calls a requests.request() method which in turn Calls the Session.request() method by creating a new session object. The request() method then creates a Request object and then prepares to make a request.
2. Create a PreparedRequest object
The request() method creates a PreparedRequest object using the Request object and prepares it for request
3. Prepare for the Request
The PreparedRequest object then makes a call to the prepare() method to prepare for the request. The prepare method makes a call to prepare_method(), prepare_url(), prepare_headers(), prepare_cookies(), prepare_body(), prepare_auth(), and prepare_hooks() methods. These methods does some pre-processing on the various request parameters
4. Send the Request
5. Get the Response
The HTTPAdapter makes a call to its send() method which gets a connection object using get_connection() which then sends the request. It then gets the Response object using the request object and the httplib response from httplib library (httplib is the python2 version of http.client)
And now onwards, How does a request actually get sent and how do we get a httplib response ?
Enter the urllib3 module
The urllib3 module is used internally by requests to send the HTTP request. When the control comes to the HTTPAdapter.send() method the following things happen
1. Get the Connection object
The HTTPAdapter gets the connection object using the get_connection() method. It returns a urllib3.ConnectionPool object. The ConnectionPool object actually makes the request.
2. Check if the request is chunked and make the request
The request is checked to see if it’s chunked or not. If it is not chunked a call to urlopen() method of ConnectionPool object is made. The urlopen() method makes the lowest level call to make the request using the httplib(http.client in python3) library. So it takes in a lot of arguments from the PreparedRequest object.
If the request is chunked a new connection object is created, this time, the HTTPConnection object of httplib. The connection object will be used to send the request body in chunks using the HTTPConnection.send() method which uses socket program to send the request.
3. Get the httplib response
The httplib response is generated using the urlopen() method if the request is not chunked and if the request is chunked it is generated using the getresponse() method of httplib. Httplib then uses socket program to get the response.
And there you have it! The most important parts of the requests workflow. There is a lot more that you can know by reading the code further.
Libraries make the life of a developer simpler by solving a specific problem and making the code shareable and widespread. There’s also a lot of hard work involved in maintaining the library. So in case you are a regular user of a library do consider reading the source code if its available and contributing to it if possible.
Also many Thanks to #dgplug friends for helping me improving this post.