Understanding python requests

In this post I am going to discuss the python-requests library. Python-requests is a powerful HTTP library that helps you make HTTP(s) requests very easily by writing minimal amount of code and also allows Basic HTTP Authentication out of the box. But before I write this post I want to describe the motivation behind me writing this post.

When it comes to writing software, libraries are a lifesaver. There is a library that addresses almost every problem you need to solve. That was the case for me as well. Whenever I used to face a specific problem I would look to see, if a library already existed. But I never tried to understand how they were implemented, the hard work that goes into building them, or the folks behind the libraries. Most of the libraries we use these days are open source and their source code is available somewhere. So we could, if we wished to, with a little hard work, understand the implementation.

During a related discussion with mbuf in #dgplug channel, he gave me a assignment to understand one of the libraries I have recently used and understand what data structures/algorithms are used. So I chose to look inside the source code of python-requests . Let’s begin by understanding how two nodes in a network actually communicate.

Socket Programming : The basis of all Networking Applications

Socket Programming is a way of connecting two nodes in a network and letting them communicate with each other. Usually, one node acts a server and other as a client. The server node listens to a port for an IP, while the client reaches out to make a connection. The combination of port and an IP is called a socket. The listener socket in the server listens to request from the client.

This is the basis of all Web Browsing that happens on the Internet. Let us see how a basic client-server socket program looks like

 

 

 

 


# A simple Client Program for making requests using socket
import socket
s = socket.socket()
# The socket listens to a specific port
port = 12345
# Connect to the server
s.connect(('127.0.0.1', port))
# Send Request to server
send_data = 'Hello'
s.send(send_data.encode('utf-8'))
# Receive response from server
recv_data = s.recv(1024)
print(recv_data.decode('utf-8'))
# close the connection
s.close()

view raw

client.py

hosted with ❤ by GitHub


# A simple server program for listening to requests via socket
import socket
s = socket.socket()
print('Socket Created')
# The socket listens to a specific port
port = 12345
# Bind the socket to a port
# Here the socket listens the request
# coming from any IP address in the network
s.bind(('', port))
print('Socket binded to port %s' % port)
# Put the socket in listening mode
s.listen(5)
print('Socket listening')
while True:
# Establish Connection with the client
c, addr = s.accept()
print('Got Connection from', addr)
# Receive request from client
recv_data = c.recv(1024)
print(recv_data.decode('utf-8'))
# Send a response to client
send_data = 'Thank you for connecting'
c.send(send_data.encode('utf-8'))
# Close the connection
c.close()

view raw

server.py

hosted with ❤ by GitHub

As you can see a server binds to a port where it listens to any incoming request. In our case it is listening to all network interfaces 0.0.0.0 (which is represented by an empty string) at a random port 12345. For a HTTP Server the default port is 80. The server accepts any incoming request from a client and then sends a response and closes the connection.

When a client wants to connect to a server it connects to the port the server is listening on, and sends in the request. In this case we send the request to 127.0.0.1 which is the IP of the local computer known as localhost.

This is how any client-server communication would look like. But there is obviously lot more to it. There will be more than one request coming to a server so we will need multi-threaded server to handle it. In this case I sent simple text. But there could be different types of data like images, files etc.

Most of the communication that happens over the web uses HTTP which is a protocol to handle exchange and transfer of hypertext i.e. the output of the web pages we visit. Then there is HTTPS which is the secure version of HTTP which encrypts the communication happening over the network using protocols like TLS.

Making HTTP Requests in Python

Handling HTTP/HTTPS requests in an application can be complex and so we have libraries in every programming language that make our life easier. In Python there are quite a few libraries that can be used for working with HTTP. The most basic is the http.client which is a cpython library. The http.client uses socket programs that is used to make the request. Here’s how we make a HTTP request using http.client

 

 

 


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

For making Requests that involve Authentication we have to use Authorization headers in the request header. We have used the base64 library here for generating a Base64 encoded Authorization String.

Using python-requests for making HTTP requests

The http.client library is a very basic library for making HTTP requests and its not used directly for making complex HTTP requests. Requests is a library that wraps around http.client and gives us a really friendly interface to handle all kinds of http(s) requests, simple or complex and takes care of lots of other nitty gritty, e.g., TLS security for HTTPS requests.

Requests heavily depends on urllib3 library which in turn uses the http.client library. This sample shows how requests is used for making HTTP requests

 


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

You can see making requests is much simpler using requests module. Also it gracefully handles which protocol to use by parsing the URL of the request

Let us now go over the implementation

Inspecting requests

The requests api contains method names similar to the type of request. So there is get, post, put, patch, delete, head methods.

Given below is a rough UML class diagram of the most important classes of the requests library

When we make a request using the request api the following things happen

1. Call to Session.request() method

Whenever we make a request using the requests api it calls a requests.request() method which in turn Calls the Session.request() method by creating a new session object. The request() method then creates a Request object and then prepares to make a request.

2. Create a PreparedRequest object

The request() method creates a PreparedRequest object using the Request object and prepares it for request

3. Prepare for the Request

The PreparedRequest object then makes a call to the prepare() method to prepare for the request. The prepare method makes a call to prepare_method(), prepare_url(), prepare_headers(), prepare_cookies(), prepare_body(), prepare_auth(), and prepare_hooks() methods. These methods does some pre-processing on the various request parameters

4. Send the Request

The Session object then calls the send() method to send the request. The send() method then gets the HTTPAdapter object which makes the request

5. Get the Response

The HTTPAdapter makes a call to its send() method which gets a connection object using get_connection() which then sends the request. It then gets the Response object using the request object and the httplib response from httplib library (httplib is the python2 version of http.client)

And now onwards, How does a request actually get sent and how do we get a httplib response ?

Enter the urllib3 module

The urllib3 module is used internally by requests to send the HTTP request. When the control comes to the HTTPAdapter.send() method the following things happen

1. Get the Connection object

The HTTPAdapter gets the connection object using the get_connection() method. It returns a urllib3.ConnectionPool object. The ConnectionPool object actually makes the request.

2. Check if the request is chunked and make the request

The request is checked to see if it’s chunked or not. If it is not chunked a call to urlopen() method of ConnectionPool object is made. The urlopen() method makes the lowest level call to make the request using the httplib(http.client in python3) library. So it takes in a lot of arguments from the PreparedRequest object.

If the request is chunked a new connection object is created, this time, the HTTPConnection object of httplib. The connection object will be used to send the request body in chunks using the HTTPConnection.send() method which uses socket program to send the request.

3. Get the httplib response

The httplib response is generated using the urlopen() method if the request is not chunked and if the request is chunked it is generated using the getresponse() method of httplib. Httplib then uses socket program to get the response.

And there you have it! The most important parts of the requests workflow. There is a lot more that you can know by reading the code further.

Libraries make the life of a developer simpler by solving a specific problem and making the code shareable and widespread. There’s also a lot of hard work involved in maintaining the library. So in case you are a regular user of a library do consider reading the source code if its available and contributing to it if possible.

Thanks to kennethreitz and the requests community for making our life easier with requests!

References

  1. https://www.geeksforgeeks.org/socket-programming-python/
  2. https://docs.python.org/2/howto/sockets.html
  3. https://en.wikipedia.org/wiki/HTTPS
  4. https://docs.python.org/3/library/http.client.html
  5. https://github.com/requests/requests
  6. https://github.com/urllib3/urllib3
  7. https://tutorialspoint.com/uml/uml_class_diagram.htm

Also many Thanks to #dgplug friends for helping me improving this post.

GSoC 2016 Wrap-up:The End of a wonderful Journey

So GSoC 2016 comes to an end as the thirteenth week in the Coding period wraps up! Well actually it is way beyond these 13 weeks when it started. It was actually in the beginning of March when I started interacting with PublicLab. It’s been a wonderful experience working so far. The place where my actual Open-source journey began. And I have learnt so much in the way.

Final Works

Here is my final report that I will submit for the Final evaluation:

GSoC 2016: Final Work Product of Expanded Q & A System for publiclab.org

This research note contains the detailed report of my work and  the contributions I made.

I also want to show my contribution graph here

contribution

By the way you can find me on Github with my username @ananyo2012

The design changes are merged and I also managed to make some contribution to the Rich Editor. Here is my PR #40 in the PublicLab.Editor repo though it is not merged yet. I really need to learn nodejs before I make any significant contributions to the Rich Editor. Also I made an Rich Editor update in plots2 in PR #664. I spent the week doing fixes on small and some large bugs a creeped up. Also I wrote a wiki on Q & A system and made the final research note for my evaluation.

Also as my mentor insisted I made some first-timer issues in plots2 that could be taken up by new contributors who are completely new to open source. This was a great move to welcome new contributors to our codebase and applause to PublicLab for doing such a great job!

And finally it was all set to go for the Final Evaluation! Results would come by August 30th!

Tough times and Lesson learnt

There were some breaking changes after the code got deployed. It was due to the PR #600 I made. I started working on this long ago and I hadn’t mentioned about it in my any of my blog posts since I wasn’t sure about how it would turn up. It was a work on updating the slugs for research notes and wikis using friendly_id. But things were tough right from the beginning since there was already a diverse slugging system present and I had to make changes keeping the format of the slugs intact. On deploying the code the old slugs of the notes and wikis got updated to new slugs and the older ones were no longer available. So all previous links failed that pointed to those research notes. The new slugs were supposed to redirect to new ones as done using friendly_id. It used a friendly_id_slug table to store the old urls but unfortunately due to some issues they weren’t saved as expected, also I missed few test cases which didn’t predict this case while testing and it was a complete disaster. But fortunately PublicLab had good database backups and things were reverted back to normal in no time.  Some code from the PR had to reverted back and the issue was fixed.

Moral of the story:  Always write good tests and think of rigorous test cases before deploying any code. In fact good tests are the lifeline of a good software development cycle.

Experiences and Best Moments

Well it was an really good experience overall! I really learnt a lot throughout the entire summer. The idea of working alongside with so many people even when you are distances apart is really amazing! We had a Video call with our mentors and other GSoC students at the end of last month and it was great! And people in PublicLab were very helpful giving us reviews in research notes alongside with our mentors.

Thanks to PublicLab and Google OSPO for giving me this wonderful opportunity! Hope to participate in GSoC 2017 again!

The PublicLab Rich Editor

The twelfth week marker! The beginning of the end! GSoC 2016 will soon come to an end. I am almost done with my work on the Q & A system though some work is still left as per my timeline goals. The only major work that remains is integrating the PublicLab Rich Editor for use in the Q & A system.

The PublicLab Rich Editor is a separate project in PublicLab that my mentor Jeffrey Warren has been working on. It is an Editor that supports both markdown and WYSIWYG content. It is to be integrated in the publiclab.org website for posting content very soon. As a part of Q & A project I thought of working on it a bit and contribute as much as possible. It wasn’t initially included in the timeline when I initially made my proposal but later I modified my timeline to include it as it really seemed interesting.

The  PublicLab Rich Editor is a s a general purpose, modular JavaScript/Bootstrap UI library for rich text posting, which provides an author-friendly, minimal, mobile/desktop (fluid) interface for creating blog-like content, designed for publicLab.org. It uses grunt for packaging and compilation. It has a rich text editor based on the Woofmark library and an autocomplete  feature  supported with horsey library. It uses jasmine as its testing framework.

Since I am not familiar with Nodejs or npm it is a bit tough for me to understand its modular structure. I still didn’t make any contribution to the Rich Editor. But I will likely make some contribution as the program wraps up. Have to learn Nodejs for this. I also have to work on fixes that come up during the last week as things are going to be deployed in the live site.

Also my team mates who were working along with me in the same website but different project are doing some awesome work and there is big merge on the Search project that is coming up. So things are getting busy in plots2 in the upcoming week. Stay tuned as the GSoC 2016 comes to an end!

Modified Views for publiclab.org – Expanded Q & A Project

The end of the eleventh week in GSoC 2016 and when I look back I am amazed to see the amount of contribution that I made to plots2. For the last couple of weeks I  have been working on designing the interface for some pages, mainly the changes due to the Q & A system. Here are what I have been working on till now

  1. Add a Recently Answered tab in questions landing page that lists out the recently answered questions
  2. Add Q & A to user Profile that would list the questions asked and answered by any user
  3. Create a a distinct sidebar for questions
  4. Add a tag based sort functionality for questions. This would enable filtering questions based on tags
  5. Add a separate question tab in tags page. Tags page contained research notes, wiki and maps earlier
  6. Make it easier to search and ask questions from the questions page by improving the Search/Ask question field that I made earlier.
  7. Finally add links for Questions page in the website header and also put links to Question page and Ask question page in various pages like the dashboard, tags page etc.

This is going to be long PR and I am still working on it; it is nearly completed. Just some little design changes and modifications are needed.

Apart from these there were some important issues that I had to take care while making these changes. I had to distinguish between research notes and questions since questions in plots2 were actually notes marked with a question:topic power tag. So I had to list out research note and questions separately. I made two methods .research_notes() and .questions() in the DrupalNode model that would extract research_notes and questions separately.

Apart from these there were many small design changes that I had to make alongside. Here are some of the screenshots of the pages. They are likely to  be changed in future.

Here is how the questions page looks now when you go to the /questions url

questions

Here is how the content of the questions section in profile page would look. You can see this in the /profile/:username url

user_profile

And here is how the questions will be listed in the /tags/:tagname url

tags

You can find my ongoing work on plots2 PR #628

 

Email notifications using Rails ActionMailer

The eighth week of GSoC 2016 coding period is over and I got to work on something new in Rails. This time I was making the email notification system for the Q & A system so that when a user posts a question or answer related people get notified via emails.

Rails has has email delivery framework called ActionMailer. Though I was familiar with Rails, I haven’t worked on ActionMailer before this. So the first thing to do was checking the ActionMailer documentation. I strongly recommend following this  documentation to  understand the email delivery system. Rails guides have good documentation for each version of Rails and the features well explained. It’s a good place to start with.

I had to create a new AnswerMailer and modify the existing CommentMailer and SubscriptionMailer to send out email notifications to users.

A Mailer basically contains some methods that defines the recipient and the subject of the mail to be sent along with the delivery method. It inherits from ApplicationMailer which basically contains the default sender mail address. The mailer view template goes under the views directory which is the standard template directory for Rails. The view template is named according to the mailer method. So for AnswerMailer if there is a method named notify_question_author the template would go with the name notify_question_author.html.erb in the views/answer_mailer directory. Whatever content you want to send in the email goes in this email. So you see the naming convention and functioning quite is similar to Rails controller though not the same.

To send a mail you need to call the mailer method from the controller as described here. Now you need to send mail on specific conditions. Like when you create a post, for example, you would want the subscribers to get specific content while when you post an answer to a question you would like to notify the question followers as well who asked the question. So it is good to create a class or an instance method in related model to do so. Like I defined a instance method answer_notify() for Answer model that notifies the question author as well  as users who liked the question. This contained  two mailer methods AnswerMailer.notify_question_author() and AnswerMailer.notify_answer_likers() which sent emails on meeting some specific conditions. So this way you can frame a custom method that sends the mail and suits the conditions as well keep the code in the controller simple an let the Model contain the method description.

You can find my entire work on this in plots2 PR #612.

GSoC 2016: The Next Phase

I finally made it through the Mid-term evaluations of GSoC 2016! It makes me happy to have made it this far but it’s still not over yet. Moving on to the next phase of GSoC. This ones going to be important as it is longer than the first phase and also I have some pretty good features that I had planned out in my timeline. Here is what I have been working in the past week after the Mid-term evaluations. But before I go to that I want to show you what my mentors gave me as feedback during my Midterm evaluation.

midterm

Really some praising words! Its things like these that really motivates to go on with my work. And that’s an essence of open source too! If you are working with the right community, people really help you out and you get the word of appreciation.

The next feature I have been  working on is making an Accept button for answers. This makes it possible for the author the question to mark an answer as accepted. The “accepted” answer would be marked with a green label  when it would be accepted. The question author can also unaccept it by pressing the same button again. Here is the screenshot of how it would look.

accept_answer

The complete work can be found in the plots2 PR #598

 

Commenting System for Publiclab – Expanded Q & A Project

The fourth week of GSoC Coding Period is also over and the Mid-term evaluations are just about to come. So coming Mid-way of the GSoC journey, here are the final tasks that I completed before the Pre-midterm period.

  1. Modify existing comment system to handle answer comments and question comments
  2. Create new views and styles for comments in the Q & A pages
  3. Create expandable text boxes for comments and a View more functionality for comments

These are the brief overview of the features implemented. Let me now come to the technical details.

The current commenting system for Research notes uses a DrupalComment model for handling the comments in the database. So I used the same model for answer comments too. For answer comments I just had to introduce a aid column in the comments table and relate comments to answers. But I had to take care of a thing that answer comments weren’t related to the question otherwise all the answer comments would also show up in the question comments section in the page. So what I did was store a 0 in the nid field of answer comments to unrelate any answer comments to questions.

I also used the same comment controller for hadling requests for answer comments with some modifications. Took me to create new js partials for answer comments as comments were created and deleted witj Ajax requests.

The next work that I completed was introduce a new comments view for the Q & A pages. One of the new feature was creating a View more button for questions so that more comments were shown on pressing the View more button. Comments are listed in the descending order of their creation. So recent comments are listed at the top and older comments are shown as you Click View more.

Another cool feature that I implemented was expanding text boxes for comments. Did you ever notice how text boxes of facebook comments expand as you type in more lines without giving a annoying scrollbar. It’s the same feature that I implemented. I followed this article for that. I just used jQuery instead of raw JavaScript. It’s a really good feature when you want to save space as well give a good User Experience.

That’s all before the Mid-term evaluations. You can find this work in plots2 Pull request #589

Answering System for Publiclab – Expanded Q & A Project

Going over the third week in GSoC Coding period, a lot of Progress has been made over this week. A large chunk of Code has been merged and it makes me relieved that an important part of the Project is complete. This week I developed the Answering system for the Q & A Project which is the next most important part after the Questioning system.

Here is the summary of the tasks that I covered in the Answering system

  1. Create a new Answer model for handling Answers to each question
  2. Develop Post, edit and delete methods for Answers
  3. Design how Answers are listed for each Question in the Question page
  4. Create Like functionality for Answers
  5. Write functional and unit tests for the features implemented

These are the overview of the features implemented. Let me now go through the technical part of the implementations.

First is the Creation of the new Answer Model. As already mentioned in earlier posts all content in plots2 is handle by the DrupalNode model. Questions are a just type of node. Each question can have any answers so the Answer model is related to the DrupalNode model and also to a DrupalUser model that is the model which handles users in plots2 since an answer is written by an user. After setting up the relation, validations and various methods for the model had to be written for the model. Validations refer to the set of requirements that each record in the table must follow like the content of an answer can’t be blank. Rails defines a set of Validation methods for a model that handles this.

Next was creating a answers controller for handling the Post, edit and delete mechanisms for the answers. Answers are posted using Ajax requests in plots2. So is the delete action. Update action uses in-place edits for editing answers but the Update action is a normal http request. Working with Ajax in Rails is as simple as anything. Rails handles Ajax requests using non-obstrusive JavaScript. You just have to define a data-remote property in the links or forms for making Ajax requests and Rails will smartly handle each Ajax call. You can read more about it here.

Now coming to the Like functionality for questions. The Like button also involves Ajax requests. It is basically a button that Likes or unlikes an answer on button press. To implement the Like functionality a new AnswerSelection model had to be introduced. Eack like is related to the answer and the user who liked it. The answer_selection table has a liking field that takes a boolen value. It is set to true when an answer is liked and it is set to false if the answer is unliked. To deal with the Ajax calls I created a new answer_like controller which has a likes action that updates the liking field of the record.

Finally writing unit tests for the new models and functional tests for the new controller and implemented changes is a required part of good code.

On top of that I got to had a nice conversation with David Days and Ujitha of the Advanced search Team and moved the question Search functionality to a separate controller to avoid any conflicts and better collaborate with each other.

You can see the work on Answering functionality in plots2 Pull request #566.

Creating a Basic Search functionality

The second week of GSoC coding period is over and here are the works that I completed during this week.

This week was mostly spent on improving the design of the Questions page as proposed by my mentor and introducing a Basic question search functionality. I had to do some follow up fixes for my previous changes like adding comments for some temporary changes and one most important fix was to have a custom method defined for questions url so that it could be used at places required instead of writing out the full path every time.

Following the design proposed in plots2 issue #554 I modified the design for the questions page using some of the dashboard templates used in plots2.

You can now see the new questions page at https://publiclab.org/questions/ .

As proposed in the design I made a Basic Search functionality for Questions. Here I will walk through How I made the Basic Search functionality for questions.

There was already a search functionality that searched for notes, wikis, maps and comments from the  search box in the navbar. Autocomplete results are shown as you type in any keywords showing a icon and title in the autocomplete results suitably linked with the item. There is also an advanced search which you can see in the url  https://publiclab.org/search/ . There you can search selectively for any research note, wiki, map or comment by checking the checkboxes present there.

The question search is similar to the search except that it searches only questions. The actions for the questions search would go in the search controller. So the search query searches for any notes whose title matches the keyword and also has a question:foo tag. The search query for that is


@notes = DrupalNode.where(
'type = "note" AND node.status = 1 AND title LIKE ?',
"%" + params[:id] + "%"
)
.joins(:drupal_tag)
.where('term_data.name LIKE ?', 'question:%')
.order('node.nid DESC')
.page(params[:page])

view raw

search_query.rb

hosted with ❤ by GitHub

Here the :id is the search keyword typed. You can see my previous post for the definition of nodes and tags used in the codebase.

Now that I have fetched the questions I have to implement the autocomplete feature. I did it using the jQuery typeahead function, similar to what was done before for the search feature.

The JavaScript code for typeahead goes as follows


$('#questions_searchform_input').typeahead({
items: 15,
minLength: 3,
source: function (query, process) {
return $.post('/questions_search/typeahead/' + query, {}, function (data) {
return process(data);
})
},
updater: function(item) {
var url;
if ($(item)[0] != undefined) url = $(item)[0].attributes['data-url'].value;
else url = '/questions_search/' + $('#questions_searchform_input').val();
window.location = url;
}
})

I have a input field with an id ‘questions_searchform_input’. The items option tells the maximum items that can be listed in the results. The minLength option specifies the minimum characters that will trigger the search. The source option gives the dataset of search results. Here we have used a ajax post request that submits the keyword to the /search/questions_typeahead/:id  url.

The actual work of fetching the results is done in the search controller. It has two actions questions and questions_typeahead. The questions action shows the results searched by the keyword and displays them on submit. If no match is found it redirects the user to the post form to post  new question. The questions_typeahead action generates the results for the autocomplete results.

You can find my work on the search functionality in the commit ceabfb0.

Lastly I wrote some functional tests for the search controller and also an integration test to test the entire search mechanism. You can find my full work in the plots2 Pull Request #555

This is just a simple search functionality. There is much more to it for developing a advanced search functionality. In fact there is a parallel Advanced Search Project going on in Publiclab lead by David Days. In course of time the search code will be modified to meet the advanced search needs.