Djangocon 2011 Day One

Real-time Django

I really enjoyed Ben Slavin’s talk on Real-time Django. He shed some good insight on what to cache and when. Essentially, I would summarize it as to cache many things at every level that makes sense. On top of perhaps view level caching, you should cache partial results or really anything that prevents you from hitting your database more than you need. I have been playing with an approach that uses this to cache data from multiple databases in one fast cache. I liked the concept of “continuous caching” where essentially some out-of-band process is caching views or data so that actual requests for views don’t hit the DB.

Choices

I chose to attend Alex Gaynor’s talk on Pypy at Quora rather than Frank Wiles talk on Postgres performance tuning but it was a tough choice. Alex thinks one big strength for Django (from his time at Quora not using it) was that picking up a foreign Django codebase is easy because of all the conventions that virtually all Django apps follow. If you know Django, you can easily tell all the URLs for any Django app (urls.py) or all the forms (forms.py). Unfortunately, the Django admin doesn’t use these conventions. In passing, he also mentioned a project called Johnny Cache which I have to try. I followed some live-blogging on Frank’s talk and it looked like there were some good tidbits.

I was interested by Eric Holscher’s talk on setting up Read the Docs and I really need to spend some time looking at their Chef recipes and learning Chef in general.

If you’re at Djangocon, say hi!

Updates on Piston

About a year ago, I wrote a little about why I’m not using Piston. Piston appears to be dead! There hasn’t been a commit since September which is almost a full year ago. This project was touted as “the way” to do REST APIs in Django and I’m sad that it doesn’t seem to be maintained. I saw some other forks of the project on Github, but there still doesn’t seem to be much work on it lately. Does anybody know what happened?

Django password security

Update (2017)

This post is over 5 years old. Django now uses PBKDF2 by default and has pluggable password hashing. See how Django stores passwords for detail.


Revision 16453 of Django improved the security of the password algorithm for the first time since the 0.90 days of years ago. This is a brief discussion on that and Django password schemes in general.

Worth its salt

Most people know that “good” passwords are at least 8 characters and contain an uppercase, lowercase, and number at least. Let’s ignore special characters for now. This yields about 47 bits of entropy. The entire set of 8 character passwords could be reversed in about 36 hours assuming ~1B hashes per second. You could just burn your new rainbow table to a DVD and break everyone’s password. Easy as pie.

Unfortunately for password crackers, this hasn’t worked for years because of salted passwords. Django uses a system of salted passwords where when a user types in their password, the user’s random salt gets prepended to the password before being compared with the salted and hashed password which is stored in the database. It works like this:

Each user gets a different random salt and this way a leaked password database cannot be easily reversed using a rainbow table. Django switched from using a 5 character salt composed of [a-f0-9] (20 bits of entropy) to a 12 character salt made up of [a-zA-Z0-9] (over 71 bits). Formerly, the salt was simply made of the first 5 characters of a sha1 hash of a call to random.random() for about a million unique possible salts. Breaking an old password database was about a million times harder than unsalted passwords which made it prohibitive but not impossible. The new system is considerably more complex.

Remaining weaknesses

If your database leaked, salted passwords will protect the entire set of hashes from being reversed. However, it will not protect a specific hash from being reversed since the salt is stored in the clear. You do not need to reverse every hash to do some damage. You can just reverse the administrator user’s password. If you look what gets stored in Django’s User table, it looks like this:

The password field stores the method of hashing (sha1), the salt and the hashed password separated by ‘$’. Given the user’s salt, we could easily check all 8 character passwords for that salt in the same 36 hours. This doesn’t change if the salt is 5 or 12 characters. Because sha1 is designed to be “fast” since it is also used for things like checksums, it doesn’t really offer much protection here. A better solution is to use a “slow” hashing algorithm that is designed specifically for password hashing like bcrypt or PBKDF2.

More generally

There have been numerous tickets (#5787, #5600, #15367) and proposals and even a project that duckpunches Django to add bcrypt. Parts of these proposals — namely using a system source of randomness where available and a longer salt — have already been implemented as part of changeset 16453. A long term solution is to make the encryption pluggable similar to the way database backends are pluggable. This makes it easy to swap out a particular encryption algorithm if weaknesses are discovered and let different installations have different algorithms based on different requirements.

Sparklines in D3

A couple weeks ago, Protovis, a visualization library I’d been using was deprecated in favor of D3 and I thought I’d share some of the work I’d done porting visualizations from the old to the new.

One example that Protovis has for which there is no corresponding tutorial is sparklines. This sparkline shows the San Diego Padres’ first 100 games of the 2011 season. Up ticks are wins and down ticks are losses. Red ticks show shutouts. This is similar to the visualization in Tufte’s “Beautiful Evidence” p. 54.

I created another simple visualization for the National League West. This shows all five teams of the NL West on a single graphic. It is pretty easy to adapt this code to a single sparkline. So far I have been fairly pleased with D3’s performance and the ease of use.

Django logging and sentry

Django-sentry has been a fairly useful tool for me. The docs are good but they only talk about integrating with logging the old way.

Here’s how to use Sentry with logging dictionary config: