Django password security

Update (2017)

This post is over 5 years old. Django now uses PBKDF2 by default and has pluggable password hashing. See how Django stores passwords for detail.


Revision 16453 of Django improved the security of the password algorithm for the first time since the 0.90 days of years ago. This is a brief discussion on that and Django password schemes in general.

Worth its salt

Most people know that “good” passwords are at least 8 characters and contain an uppercase, lowercase, and number at least. Let’s ignore special characters for now. This yields about 47 bits of entropy. The entire set of 8 character passwords could be reversed in about 36 hours assuming ~1B hashes per second. You could just burn your new rainbow table to a DVD and break everyone’s password. Easy as pie.

Unfortunately for password crackers, this hasn’t worked for years because of salted passwords. Django uses a system of salted passwords where when a user types in their password, the user’s random salt gets prepended to the password before being compared with the salted and hashed password which is stored in the database. It works like this:

Each user gets a different random salt and this way a leaked password database cannot be easily reversed using a rainbow table. Django switched from using a 5 character salt composed of [a-f0-9] (20 bits of entropy) to a 12 character salt made up of [a-zA-Z0-9] (over 71 bits). Formerly, the salt was simply made of the first 5 characters of a sha1 hash of a call to random.random() for about a million unique possible salts. Breaking an old password database was about a million times harder than unsalted passwords which made it prohibitive but not impossible. The new system is considerably more complex.

Remaining weaknesses

If your database leaked, salted passwords will protect the entire set of hashes from being reversed. However, it will not protect a specific hash from being reversed since the salt is stored in the clear. You do not need to reverse every hash to do some damage. You can just reverse the administrator user’s password. If you look what gets stored in Django’s User table, it looks like this:

The password field stores the method of hashing (sha1), the salt and the hashed password separated by ‘$’. Given the user’s salt, we could easily check all 8 character passwords for that salt in the same 36 hours. This doesn’t change if the salt is 5 or 12 characters. Because sha1 is designed to be “fast” since it is also used for things like checksums, it doesn’t really offer much protection here. A better solution is to use a “slow” hashing algorithm that is designed specifically for password hashing like bcrypt or PBKDF2.

More generally

There have been numerous tickets (#5787, #5600, #15367) and proposals and even a project that duckpunches Django to add bcrypt. Parts of these proposals — namely using a system source of randomness where available and a longer salt — have already been implemented as part of changeset 16453. A long term solution is to make the encryption pluggable similar to the way database backends are pluggable. This makes it easy to swap out a particular encryption algorithm if weaknesses are discovered and let different installations have different algorithms based on different requirements.

Securing a Django Site in Production

Edit (2020): This is pretty outdated. Instead, probably the best resource is the Django deployment checklist.

I was setting up a Django site for somebody recently and got asked the question, “is it possible for someone to hack my site?”. The answer, of course, is yes. To some degree, this is unavoidable. If somebody is willing to expend the time, effort and money, it is almost impossible to have a complex site that is perfectly secure. Even security “experts” can get it wrong. However, this got me thinking about the steps to secure a Django site.

Django does a good job of being reasonably secure by default. Unlike some other frameworks where you have to explicitly use CSRF tokens, Django uses them unless you tell it not to. Django escapes data from your templates automatically and is generally safe from SQL injection. The framework contains the building blocks to build a secure site, but quite often the site is deployed on a shaky foundation.

Securing the admin

For maximum security, the Django admin site should probably always be deployed on a web server running HTTPS. There’s a good guide on setting up SSL for the admin. Redirecting requests for /admin to HTTPS is one way. Another way is setup the admin on a subdomain like admin.example.com and handle them like that. This is what it looks like in Nginx:

Using this, you can proxy to two different Django instances: one handles the site over HTTP and one handles just the admin over HTTPS. Depending on your exact setup, you probably also want to mark the cookie as secure.

While the admin always needs security, some sites could also benefit from security outside of the admin if they’re handling user details, email addresses or other things. As an application developer, you need to build in that security — Django doesn’t know what you need to protect. Just remember the next time you login to your Django admin screen on a wifi hotspot at Starbucks that anybody can run something like Firesheep or Wireshark and capture your credentials. It’s amazing how many notable sites get this wrong. It reminds me of the wall of sheep.

Securing the server

It is amazing how many people put out a server with an inadequate firewall. Either they leave their database port wide open, memcached port open (this is REALLY bad — see here) or in some other way greatly increase the possible attack surface. While I generally knew what Amazon Web Services (AWS) could do as far as hosting, I had never used them before recently and I was impressed by their security. AWS makes configuring the firewall super easy and by default, only port 22 is open and SSH only accepts keys not passwords. That’s fairly secure by default! It gives a simple web GUI to open select ports and only to select machines. For example, if you host your database on a different server than your web server, only the web server should be able to connect to the database, not the whole internet. Also, Amazon S3 can serve its files over HTTPS as well. It’s a rather handy feature. I expect Rackspace is fairly similar in most regards.

Django security update

There were a couple fixes and changes in Django 1.2.5, but the main change was to CSRF exceptions to AJAX requests. The decision to remove the exception — despite backwards incompatibility — was the right move considering that the assumption that XmlHttpRequests could only come from the browser is no longer true (was it ever?). However, this release makes me wonder how many site authors didn’t bother to change much and just put @csrf_exempt above their web services just to get their site working again quickly with the new version.

Note: I secured the wordpress admin using the guide here and the WordPress HTTPS plugin. It’s a self-signed cert so I’m only getting maybe 75% of the security pixie dust, but I can deal with that.

Edit (September 14, 2011): Take a look through the Django security docs which your humble blogger helped write.

Django Security Update September 2010 Edition

Yesterday, the Django team released a security update. The post basically says it all. If you are on Django 1.2.1, UPDATE NOW!

The details

The issue is a standard non-persistent cross site scripting (XSS) exploit. Django explicitly trusted the cross site request forgery token which is supposed to be a hexdigest based on your SECRET_KEY in settings.py. However, cookies are simply stored on the client filesystem and they should generally be considered untrusted user input.

Exploit howto

First, setup a simple Django project that has the admin enabled. Visit your simple website and you’ll be issued a CSRF token that is saved in your cookie. Simply edit the token (with Edit this Cookie for Chrome maybe) and enter a script tag and save it. Reload the page and the script tag you entered will get echoed back unescaped and executed.

Edit: I should note that this vulnerability affects any form that uses a CSRF token, not just the admin.

Djangocon 2010 Day Two

More live-ish blogging…

The keynote

Eric Florenzano’s talk Why Django Sucks, and How We Can Fix It (slides) builds on top of the 2008 keynote by Cal Henderson entitled Why I Hate Django by pointing out instances where Django can improve. Like Cal, Eric complained about some things — some of which may not be solvable — and hopefully like some of the things Cal complained about they’ll get fixed. The note about needing more contributors again came up. Becoming a core developer is pretty much impossible. He complained about the reusability of apps citing django-avatar as an example. By rigidly defining a model, a “reusable app” becomes somewhat locked and cannot easily store new metadata. I really liked the concept of lazy foreign keys.

I’m somewhat torn on the idea of just switching Django’s source to github. I don’t fully buy Russell’s argument that you can just checkin to github and it will trigger to subversion. While that is a true statement, by having mirrors in bitbucket, launchpad and github, the Django core has fractioned the social aspects of those services. User comments are going to be split between between those services rather than being concentrated. However, similar to how Django used to allow comments in the documentation, these comments may not be that useful. I also think removing the Django admin from Django would be a travesty.

Django Security

Adam Baldwin’s Pony Pwning (notes, slides) was a decent hundred foot overview of Django security and web security in general. I would have really liked to see more details although it looked like some security vulnerability he found was redacted from his slides to give the Django developers time to fix it. Other than that, he said that the Django community as a whole “gets” security (I generally agree) and that while Django is fairly secure by default developers still manage to make mistakes. He did point out that there is no clickjacking protection for the admin and using X-FRAME-OPTIONS would be a good addition. Also, it seems that Django’s escaping could be improved. I liked that he pushed pen testing with w3af and running a web app firewall like mod_security. While frameworks can buy a certain level of security cheaply since it’s unlikely that a single developer or group will get everything right and it’s more likely that a well thought through framework will be more secure. However, then problems in the framework are discovered and basically all sites using it are suddenly vulnerable. I think some security researcher really just needs to spend some time with Django and really push the limits of its security model. I’ve talked about doing it at work, but buyin will be tough.

Other Stuff

Andrew Godwin’s (the developer of South) talk Step Away From That Database (slides – pdf) was the hundred foot overview of the various data stores for Django. One interesting trend that I’m picking up on is that Django developers seem to dislike MySQL a lot and Postgres is preferred by far. This might have something to do with MySQL development stalling and getting forked into Drizzle.

I enjoyed, but don’t entirely agree with Malcolm Tredinnick’s Modeling challenges (data) which seemed to be more about how to model complex data into Django’s models but in reality could have very well left out Django entirely and focused on databases and data modeling. I really liked the part about modeling dates that have different precisions. However, I am less sold on his implementation of modeling sports teams and players from data retrieved from retrosheet.

While this will result in fairly normalized data form data that looks like “keara001″,”Austin Kearns” where 001 is Kearns’ first season and gets incremented yearly. Basically, this easily allows you to join data easily and find all the history of where a player was and is, but it doesn’t take into account seasons very well. I think retrosheet does it the way they do so that you can easily get a player’s stats for a given year. It’s a complicated problem but I’ve seen retrosheet’s solution in many places.

Django add-ons

I think the best talk of the day so far (it JUST finished) was Eric Holscher’s Large Problems in Django, Mostly Solved (slides) which basically gives the hundred foot overview of the add-ons available and the external packages you should be using. Some stuff is pretty clear like pip, south, celery, fabric and sphinx, but on top of that there were packages that I hadn’t heard of or knew relatively little about like Haystack for search and Gunicorn for easy or simple deployments (it also might be the right solution for an easy Pythonic Hudson replacement). I was interested that Eric sees that Piston is in competition with Tastypie and that django-tagging is being overtaken by django-taggit. At work, we setup django-tagging but since then it seems that django-taggit has emerged since then. I also loved Eric’s metric of lines of docs and lines of tests as a metric of how good a project is.

Themes
  • djangopackages.com is the new place to go for Django add-ons.
  • 1.3 will probably have some good logging stuff built-in