Understanding python requests

In this post I am going to discuss the python-requests library. Python-requests is a powerful HTTP library that helps you make HTTP(s) requests very easily by writing minimal amount of code and also allows Basic HTTP Authentication out of the box. But before I write this post I want to describe the motivation behind me writing this post.

When it comes to writing software, libraries are a lifesaver. There is a library that addresses almost every problem you need to solve. That was the case for me as well. Whenever I used to face a specific problem I would look to see, if a library already existed. But I never tried to understand how they were implemented, the hard work that goes into building them, or the folks behind the libraries. Most of the libraries we use these days are open source and their source code is available somewhere. So we could, if we wished to, with a little hard work, understand the implementation.

During a related discussion with mbuf in #dgplug channel, he gave me a assignment to understand one of the libraries I have recently used and understand what data structures/algorithms are used. So I chose to look inside the source code of python-requests . Let’s begin by understanding how two nodes in a network actually communicate.

Socket Programming : The basis of all Networking Applications

Socket Programming is a way of connecting two nodes in a network and letting them communicate with each other. Usually, one node acts a server and other as a client. The server node listens to a port for an IP, while the client reaches out to make a connection. The combination of port and an IP is called a socket. The listener socket in the server listens to request from the client.

This is the basis of all Web Browsing that happens on the Internet. Let us see how a basic client-server socket program looks like

 

 

 

 


# A simple Client Program for making requests using socket
import socket
s = socket.socket()
# The socket listens to a specific port
port = 12345
# Connect to the server
s.connect(('127.0.0.1', port))
# Send Request to server
send_data = 'Hello'
s.send(send_data.encode('utf-8'))
# Receive response from server
recv_data = s.recv(1024)
print(recv_data.decode('utf-8'))
# close the connection
s.close()

view raw

client.py

hosted with ❤ by GitHub


# A simple server program for listening to requests via socket
import socket
s = socket.socket()
print('Socket Created')
# The socket listens to a specific port
port = 12345
# Bind the socket to a port
# Here the socket listens the request
# coming from any IP address in the network
s.bind(('', port))
print('Socket binded to port %s' % port)
# Put the socket in listening mode
s.listen(5)
print('Socket listening')
while True:
# Establish Connection with the client
c, addr = s.accept()
print('Got Connection from', addr)
# Receive request from client
recv_data = c.recv(1024)
print(recv_data.decode('utf-8'))
# Send a response to client
send_data = 'Thank you for connecting'
c.send(send_data.encode('utf-8'))
# Close the connection
c.close()

view raw

server.py

hosted with ❤ by GitHub

As you can see a server binds to a port where it listens to any incoming request. In our case it is listening to all network interfaces 0.0.0.0 (which is represented by an empty string) at a random port 12345. For a HTTP Server the default port is 80. The server accepts any incoming request from a client and then sends a response and closes the connection.

When a client wants to connect to a server it connects to the port the server is listening on, and sends in the request. In this case we send the request to 127.0.0.1 which is the IP of the local computer known as localhost.

This is how any client-server communication would look like. But there is obviously lot more to it. There will be more than one request coming to a server so we will need multi-threaded server to handle it. In this case I sent simple text. But there could be different types of data like images, files etc.

Most of the communication that happens over the web uses HTTP which is a protocol to handle exchange and transfer of hypertext i.e. the output of the web pages we visit. Then there is HTTPS which is the secure version of HTTP which encrypts the communication happening over the network using protocols like TLS.

Making HTTP Requests in Python

Handling HTTP/HTTPS requests in an application can be complex and so we have libraries in every programming language that make our life easier. In Python there are quite a few libraries that can be used for working with HTTP. The most basic is the http.client which is a cpython library. The http.client uses socket programs that is used to make the request. Here’s how we make a HTTP request using http.client

 

 

 


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

For making Requests that involve Authentication we have to use Authorization headers in the request header. We have used the base64 library here for generating a Base64 encoded Authorization String.

Using python-requests for making HTTP requests

The http.client library is a very basic library for making HTTP requests and its not used directly for making complex HTTP requests. Requests is a library that wraps around http.client and gives us a really friendly interface to handle all kinds of http(s) requests, simple or complex and takes care of lots of other nitty gritty, e.g., TLS security for HTTPS requests.

Requests heavily depends on urllib3 library which in turn uses the http.client library. This sample shows how requests is used for making HTTP requests

 


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

You can see making requests is much simpler using requests module. Also it gracefully handles which protocol to use by parsing the URL of the request

Let us now go over the implementation

Inspecting requests

The requests api contains method names similar to the type of request. So there is get, post, put, patch, delete, head methods.

Given below is a rough UML class diagram of the most important classes of the requests library

When we make a request using the request api the following things happen

1. Call to Session.request() method

Whenever we make a request using the requests api it calls a requests.request() method which in turn Calls the Session.request() method by creating a new session object. The request() method then creates a Request object and then prepares to make a request.

2. Create a PreparedRequest object

The request() method creates a PreparedRequest object using the Request object and prepares it for request

3. Prepare for the Request

The PreparedRequest object then makes a call to the prepare() method to prepare for the request. The prepare method makes a call to prepare_method(), prepare_url(), prepare_headers(), prepare_cookies(), prepare_body(), prepare_auth(), and prepare_hooks() methods. These methods does some pre-processing on the various request parameters

4. Send the Request

The Session object then calls the send() method to send the request. The send() method then gets the HTTPAdapter object which makes the request

5. Get the Response

The HTTPAdapter makes a call to its send() method which gets a connection object using get_connection() which then sends the request. It then gets the Response object using the request object and the httplib response from httplib library (httplib is the python2 version of http.client)

And now onwards, How does a request actually get sent and how do we get a httplib response ?

Enter the urllib3 module

The urllib3 module is used internally by requests to send the HTTP request. When the control comes to the HTTPAdapter.send() method the following things happen

1. Get the Connection object

The HTTPAdapter gets the connection object using the get_connection() method. It returns a urllib3.ConnectionPool object. The ConnectionPool object actually makes the request.

2. Check if the request is chunked and make the request

The request is checked to see if it’s chunked or not. If it is not chunked a call to urlopen() method of ConnectionPool object is made. The urlopen() method makes the lowest level call to make the request using the httplib(http.client in python3) library. So it takes in a lot of arguments from the PreparedRequest object.

If the request is chunked a new connection object is created, this time, the HTTPConnection object of httplib. The connection object will be used to send the request body in chunks using the HTTPConnection.send() method which uses socket program to send the request.

3. Get the httplib response

The httplib response is generated using the urlopen() method if the request is not chunked and if the request is chunked it is generated using the getresponse() method of httplib. Httplib then uses socket program to get the response.

And there you have it! The most important parts of the requests workflow. There is a lot more that you can know by reading the code further.

Libraries make the life of a developer simpler by solving a specific problem and making the code shareable and widespread. There’s also a lot of hard work involved in maintaining the library. So in case you are a regular user of a library do consider reading the source code if its available and contributing to it if possible.

Thanks to kennethreitz and the requests community for making our life easier with requests!

References

  1. https://www.geeksforgeeks.org/socket-programming-python/
  2. https://docs.python.org/2/howto/sockets.html
  3. https://en.wikipedia.org/wiki/HTTPS
  4. https://docs.python.org/3/library/http.client.html
  5. https://github.com/requests/requests
  6. https://github.com/urllib3/urllib3
  7. https://tutorialspoint.com/uml/uml_class_diagram.htm

Also many Thanks to #dgplug friends for helping me improving this post.

Understanding the Linux Command Line

For quite some time I was thinking of learning Bash scripting. Thanks to dgplug Summer Training I have taken the first step to Learn the Linux Command line. We are using the LYM (Linux Command line for You and Me) book in the classes to learn the basic Linux commands and understand the GNU/Linux system.

Why should you learn command line ?

  • The command line is a very powerful tool for a GNU/Linux system. You can do anything from the command line starting from editing a text file to shutting down your system.
  • Anyone who wants to understand the GNU/Linux system must have a understanding of the command line.
  • Automating day to day to day stuff becomes easier if you know command line. This is especially indispensable for Sys Admins and Network Managers.

There has been already few classes in dgplug Summer Training where we were asked to read few chapters from the book and ask doubts regarding it. I use Fedora 28 as my Linux distribution where I use the gnome-terminal for running the shell commands.

I have learnt quite many things till now. I am on the seventh chapter of the book now which is “Linux Services“. Slowly I have been sinking all the commands to my mind and I hope to make use use of them to make my life easier with my Linux machine. The book is well written for a beginner who is just starting with Linux command line on the other hand it has also references to some important topics like FHS .

Thanks to the authors for their efforts to write this. I  hope I can contribute to it at some point 🙂

Getting Aboard dgplug Summer Training 2018

This year I decided to join the dgplug Summer Training conducted by the Durgapur Linux User’s Group. I didn’t blog for a long time since last year, but last class we had our class on blogging and it really motivated me to start my blog again.

I knew dgplug from my initial days of my college since I studied at Durgapur. I got to know about this Training in 2017 when I first met Kushal in FOSSASIA Summit and searched for dgplug’s website. I have been willing to join it since then, but last year I couldn’t really make time. So after I saw the tweet about this year’s edition I was eagerly waiting to get aboard.

The First Class

It started on 17th June, 2018 at 7 pm sharp. I had my flight to Hyderabad that day on 10.30 pm and I didn’t want to miss the first class so I left early to reach the Airport before the class starts. The class would take on IRC on #dgplug channel on freenode. I just reached on time at the airport and I had to do my check in as well. So I stood in the queue connecting to IRC from my mobile using Riot app and eagerly waited for it to start People were doing countdown before it started and I was amazed to see the number of IRC handles active during the session. The session started, everyone greeted each other. Then Kushal started with some basic rules for the class.There was an interesting way to ask questions during the class. We had to type a “!” and a bot named “batul” would keep queuing those and we had to wait until our turn came and batul prompted us to ask our question. The teacher could take the questions by typing “next”. I had a slow internet connection and Riot took time to sync the messages. But I managed to follow along. Kushal said to read the FAQs for the training and ask questions in case we have any doubt. Then we were asked to introduce ourselves by raising hands using a “!”. We were then guided with the rules for every class and asked to read How to ask smart questions. I realized this is something which should actually be taught in every community so that we ask meaningful questions. Then they were open for questions. At the end of the session we were asked to read some links that were given as home work. The next session would happen the next day.

This was like the format of each session and mind it it was totally conducted on IRC which enabled really fast communication in low bandwidth. Though the entire communication was text based it really felt as if I was in a real class. There is Roll Call at the beginning and end of class, I had to raise hand for asking question, had to wait for my turn to ask my question and was given homework that was not at all over burdening. It is just the right amount given till now and I could get time to read through the reading materials provided after the class and was also able to manage it between my Office schedule.

The best part of the training so far was that we were made to think and come out of our comfort zone in order to do a task. We had to think and search even before asking a question. I believe that is a good practice for everything we have a question.

Already 2 weeks of the training session is complete and I learnt a lot of things in the way.

First Week

There was a class on Free Software Communication guidelines which was taken by Shakthi Kannan (aka mbuf) where we were taught about mailing list etiquette and other communication guidelines. The slide on mailing list etiquette and Communication will really help me in the long run.

Then there was a class on Linux Command line by Jason Braganza which taught us to get started with Basic Linux commands. The LYM (Linux for You and Me) book is a  good book for newcomers and I really find it easy to follow. It is being used in all our Linux command line sessions and we are asked to read few chapters at a time and then we have doubt sessions where we can ask questions.

On Friday, 21st June we had our first guest session by Harish Pillay. The session was very informative. He told us about the time of Internet when it had just started and was known as ARPANET. He told us the importance of contributing to Free Software/Open Source and Red Hat’s Open culture for contributing to Open Source projects. Then there were all sorts of questions he answered on Startups, open source contribution, Licenses, distros etc.

In the weekend we were  asked to watch 2 documentaries. One was  The internet’s Own Boy and Citizenfour. I could complete watching the first one. I got to know a lot more about the History of Internet and Free Software Movement. Also we were asked to read this article where I got to know how the History of Free Software movement is deeply rooted to the Freedom of Speech and Expression. I got to learn more about The “Free as in Freedom” ideology, the origin of the GNU project and the importance of Free Software. I did a bit more research on GNU/Linux Operating systems and understood what a kernel means with respect to an Operating System and that it requires a lot more things than a kernel to make a OS work.

Second Week

In the next week we had a class on Privacy and Opsec from Kushal. He told us how to do a Threat Modelling of our Risks and Good practices to ensure better security. I already installed Tor Browser and have been trying to use it as my browser now. Giving up on Google is hard but I am trying my best to use DuckDuckgo for searches now. I haven’t really switched to using password managers. It will take me some time before I switch over using a password manager since I have to go over its usage to get familiar with it.

On Wednesday, 27th June we had guest session by Nicholas H.Tollervey. He is a classically trained musician, philosophy graduate, teacher, writer and software developer.  It really inspired me to see if a person loves something he can learn anything from whichever field he/she comes from. He is one of the pioneers of the micro:bit project. He answered questions on open source contribution and best practices for writing software.

On Thurday, 28th June we had guest session from Pirate Praveen. He is a political activist who uses Free Software Principles in his work and formed the Indian Pirates organisation. He is an upstream contributor to debian and has packaged software like GitLab and Diaspora so that they can be easily setup just by running one command. The motivation for him to contribute to open source projects was because he wanted to solve a problem. He answered questions about his entry into politics, packaging and the way of solving problems.

Then we had a class of blogging taken by Jason Braganza who explained the nitty-gritty of blogging. Over the weekend I completed watching Citizenfour and Nothing to Hide which we were recommended to watch. It made me know a lot of things about Privacy and Surveillance which I  was unaware off. Privacy is a basic right and we should all be aware how to protect it.

The experience has been good so far and I am hoping to learn a lot more things in the upcoming days. Thanks to the awesome team who has been working hard to make this possible 🙂