Piston Looks Good, But I’m Not Using It
Firstly, I’ve been missing in action for a few months and I apologize to you, my loyal reader, for that. Without making excuses (here comes the excuses), work has been picking up, my girlfriend moved from about 15 miles away to only about 8 blocks away and Starcraft II is in beta. Regardless, I’m back in the Python action. WoooHooo!
REST interfaces & Django
This post is somewhat of a follow-up on my post on RESTful Django web services because I didn’t really talk in my previous post about Piston. Piston (sometimes django-piston) is a library for creating RESTful services in Django and it supports some of the features that I spoke about in my previous post such as good caching support with Django’s cache framework, different output formats (eg. XML & JSON) via what Piston calls emitters, and the ability but not the requirement to use Django models as REST resources. I don’t know how I missed Piston before, but people blog (*) about it and it has made the rounds on the Django User’s list. However, even after looking closely at it, I decided not to go with it. In this post I’m going to talk about what I did and did not like and why I rolled my own REST micro-framework. That almost sounds like I’m giving myself too much credit given that my micro-framework is only ~30 lines.
(*) BTW, Despite the fact that Eric updates his blog somewhat infrequently (sounds familiar) it is well worth a read.
Piston: the good
Piston ships with quite a bit of good documentation and allegedly is used to power some of BitBucket’s services — lending to its credibility. Specifically, I liked the fact that it plugged directly into Django models. You simply write a short Handler for your model explaining what fields to expose and you’re mostly done.
import re
from piston.handler import BaseHandler
from myapp.models import Blogpost
class BlogPostHandler(BaseHandler):
allowed_methods = ('GET')
fields = ('title', 'content', ('author', ('username', 'first_name')))
exclude = ('id', re.compile(r'^private_'))
model = Blogpost
def read(self, request, post_slug):
post = Blogpost.objects.get(slug=post_slug)
return post
It effectively wraps up your handler and does all the JSON/XML/YAML serialization for you while still giving you the ability to customize it. On top of this, it plugs in nicely with Django’s form validation and allows you to do some other nice features like throttling requests based on which user does it.
Piston: the bad & the ugly
I started to look at Piston, but because I wasn’t using throttling, using OAuth, outputting anything other than JSON and I wasn’t tying to models I didn’t think that Piston bought me anything. In reality, it wasn’t doing anything my me other than properly returning HttpResponseNotAllowed. My other issue is that this project involved different outputs based on HTTP headers. For example, a GET on a certain URL would return JSON formatted data (a read in the CRUD world) if an HTTP header was present and an HTML page presenting that data if it wasn’t. Piston uses different emitters based on a request parameter format (eg. /path/resource/?format=JSON). Piston gets you up and running quickly, but it didn’t fit my use case.
Also, this is a little nitpicky, but when I see something like:
return rc.FORBIDDEN # returns HTTP 401
I cringe a little bit considering that status code 403 is the correct status code for Forbidden. There’s a ticket for this already. Why did Piston define constants for returning various status codes anyway when that functionality is already built into Django. Is rc.DELETED so much easier than HttpResponse(status_code=204)? Perhaps it’s a little clearer and Django really should have HttpResponse subclasses for even the less common responses, but I think this definitely involves repeating yourself (and Django’s mantra is don’t repeat yourself).
The solution
I always wondered why Django didn’t allow for routing URLs based on the HTTP method: It seems like such a common use case. The developers discussed it back in 2006, but in the end it was decided that building only the simple case was best as it yielded a relatively clean urls.py. Building off of that thread, the example in the Django book (search for “method_splitter”) and another blog post, I rolled a little framework to meet my needs instead of using something like Piston.
## utils/dispatcher.py
from django.http import HttpResponseNotAllowed
# see rfc 2616 - http://www.ietf.org/rfc/rfc2616.txt s9.2 - s9.9
HTTP_METHODS = ('GET', 'POST', 'PUT', 'HEAD', 'TRACE', 'DELETE', 'OPTIONS', 'CONNECT')
def service_dispatcher(request, *args, **kwargs):
"""
Routes requests to the correct view method based on the HTTP method
"""
# loop over all possible HTTP methods and find the appropriate service
allowed_methods = []
appropriate_service = None
for method in HTTP_METHODS:
service_view = kwargs.pop(method, None)
if service_view is not None:
# store legal HTTP methods in case we need to return a 405
allowed_methods.append(method)
# found the correct service method
if request.method == method:
appropriate_service = service_view
# if the correct service was found, call it
# otherwise return a 405 - method not allowed - error
if appropriate_service is not None:
return appropriate_service(request, *args, **kwargs)
else:
return HttpResponseNotAllowed(allowed_methods)
## urls.py
from django.conf.urls.defaults import *
from myapp.utils.dispatcher import service_dispatcher
from myapp.blog import services
urlpatterns = patterns('',
url(r'^/myapp/blog/$', service_dispatcher, {'GET': services.blog_get, 'POST': services.blog_post}),
)
I found this to be a much simpler and easily extensible. The argument against this is that urls.py becomes bigger, but in a lot of ways I found this to be clearer. From reading the urlpatterns, I can quickly tell exactly what gets called in each case. In addition, routing differently based on HTTP headers, cookies, the source or anything else becomes as simple as adding a parameter and a little code to service_dispatcher.
In the end, it’s wasn’t that I didn’t like Piston, it’s just that I didn’t need it.
Updates March 2010 Edition
This post is mainly going to be an update on what I am thinking and what I’ve been working on the past few weeks.
Work
At the beginning of this year, I took a new position (same company) in a security group. Our primary focus is to ensure that the company is shipping secure, OSS compliant, legally compliant code. However, my specific role in that is to develop tools (with Django) to help in making sure that happens. This is an exceptionally interesting project and involves pulling in vast amounts of data (terabytes) from many sources (multiple VCS, multiple databases) and presenting it in a comprehensive manner. This project and my work has led to some good problems:
Some of our databases are MSSql databases. This is a problem since we’re a Linux shop. Pyodbc works great for connecting to MSSql from Linux, but unfortunately, there are some incompatibilities with django-pyodbc. In addition, the project doesn’t seem to be that widely used so it isn’t supported or documented as well as it could be. We are considering sqlalchemy/elixir as well, but I’ve been able to patch up django-pyodbc to get it (mostly) working with the Django trunk. I also have some concerns about the django-pyodbc project as a whole. I’m considering working on this project pretty heavily.
Also, as part of my work, a coworker and I detailed a security flaw we found with urllib2. It resulted in basic authentication credentials being sent to sites that did not request it (and weren’t running SSL).
Future of RPC4Django
I have been considering moving RPC4Django from my personal subversion repository to Google Code or Github. I feel that there are a few advantages of this:
- It is easier for others to contribute and get involved.
- A public bug tracker that would let other people easily raise issues instead of emailing me directly. This way we have public archives and the information can be found by anyone interested in RPC4Django.
- If I were hit by a bus, some one could easily take it over
I might make a mailing list as well. Are there any strong opinions on this?
Django Scripting and the Crontab
Sometimes, you need part of your Django application to run from the command line. These scripts can be caching jobs that run periodically to speed up performance or data collection jobs that pull information from various sources into your application. Although James Bennett has a great article on writing standalone Django scripts, I just wanted to update it with changes that have happened since 2007. I ran into this problem about the same time on both a work project and on a personal project.
The problem with standalone scripts
The basic problem is that you want your script to run in the context of your application. You want to access your databases in the normal way and you want to import modules by the same paths. In general, you can do all of this work yourself by making sure your PYTHONPATH is correct and DJANGO_SETTINGS_MODULE is set properly, but it is so much easier to just create a custom management command. This is especially convenient since it makes your script portable (Windows, Mac & Linux — crontab & schedule tasks) as well easily distributable with your application.
Custom management commands
A custom management command is a command that can be run from manage.py. Essentially, Django requires that you create the following type of structure under your application:
management/
__init__.py
commands/
__init__.py
mycommand.py
Then, in mycommand.py you must subclass django.core.management.base.BaseCommand like so:
from optparse import make_option
from django.core.management.base import BaseCommand, CommandError
# Class MUST be named 'Command'
class Command(BaseCommand):
# Displayed from 'manage.py help mycommand'
help = "Your help message"
# make_option requires options in optparse format
option_list = BaseCommand.option_list + (
make_option('--myoption', action='store',
dest='myoption',
default='default',
help='Option help message'),
)
def handle(self, *app_labels, **options):
"""
app_labels - app labels (eg. myapp in "manage.py reset myapp")
options - configurable command line options
"""
# Return a success message to display to the user on success
# or raise a CommandError as a failure condition
if options['myoption'] == 'default':
return 'Success!'
raise CommandError('Only the default is supported')
This will allow you to run (mycommand is named for mycommand.py):
> python manage.py mycommand
or create a crontab entry as follows:
* * * * * cd /home/path/to/project && python manage.py mycommand
Conclusion
This provides the cleanest, most flexible way to build and support standalone scripts that can be used outside of the web server. It will run in exactly the same environment as your application and the same modules used in your application can be used here. The only problem I ran into was that custom management commands must be run from the same directory as manage.py.
Edit: I added thorough documentation on management commands to ticket #9170
Update: The 1.2 documentation contains the changes from my patch.
Good Documentation Makes a Good Project
Perhaps it is a platitude but it’s true that good documentation makes a project successful. Recently, Jacob Kaplan-Moss of Django fame started a series of articles on documentation — perhaps one of the most underrated aspects of a software. Although I may not agree with everything (technical documentation following MLA?! Just couldn’t get away from that lit degree huh Jacob?), the series as a whole is truly readable and he puts words to some of my own vague thoughts on documentation. With all that in mind, I want to focus on what led to Django’s documentation and some concrete examples of successful and not so successful documentation.
The evolution of software documentation
Back in the dark ages, all documentation was done by hand. Considering that developers are pretty bad about documentation now, I don’t even want to imagine what it was like back then. When I first got on the programming scene, Javadoc was a pretty new development. Javadoc was great in that it allowed you to write documentation only once and at the same time you wrote the code. Then, you used your code files to generate pretty HTML. This was amazing for API reference documentation and I still think that Java has the most well documented standard library API of any programming language. After seeing that you could auto-generate documentation, a host of clones like POD and Doxygen followed. Unfortunately, too many people decided that having Javadoc means that your project is documented (I’m looking at you SNMP4J!).
The next stage in the evolution came from PHP’s excellent documentation of all places. The PHP folks started including formatted examples, suggestions and caveats with their reference API. They also started putting together guides about particular aspects of the language such as security or working with Oracle. On top of this, they had the seemingly ingenious idea of allowing users to post comments on the documentation. At first, this worked well. Years ago, I posted a couple steps I took to get PHP working with Oracle which is still a pain to this day. However, over time, the comments got clogged with snippets of useless code, cries for help and other drivel. It’s no small wonder that Django removed the ability to comment on their documentation.
Django embodies the current incarnation of documentation and although I think it is the best documented project I’ve worked with, I’m not convinced it is ideal. Django combines the PHP-style guides with tutorials. This works exceptionally well for getting new users off the ground quickly and bringing users familiar with the project up to speed in a new area. However, I feel that Django lacks somewhat in the reference API area and part of this has to do with the fact that they are writing all of the documentation by hand instead of generating it from docstrings. The method references are usually an afterthought (because they don’t happen at the same time the code is written) and don’t contain the level of detail that the PHP or Java documentation does. Without looking at the actual code, how else would you figure out that the markdown filter can accept extra parameters.
A success story
Despite the trash I just talked about Django (blasphemy!), it is a success and the documentation is superb. It gets you up and running very quickly and has detailed documentation on virtually every aspect of the project. I attempt to model both my open source projects’ and my work projects’ documentation after Django’s — imitation is definitely a form of flattery. Django has maintained its documentation by enforcing some rules and good practices. The Django project maintains all the reST formatted documentation files in the code line and requires that patches include updates to the relevant documentation. This ensures that the docs are up-to-date with the code — a problem lots of projects suffer from. Django uses Sphinx to generate their documentation periodically from the code line — I don’t know how often, but fairly often — and make it available as the official Django documentation. Inaccuracies and problems are caught quickly.
What not to do
Early Linux suffered from some serious documentation fail. I remember being familiar with installing Windows and Mac system 8-9 from scratch and figured that I could install Linux. Perhaps I made the mistake of trying Slackware, but I can remember even after having a mostly successful install having to compile all sorts of packages from scratch. It was a great hobby for CS student back in 2001, but hardly a well documented and easy to use project. The fact that the code is great and the system is stable doesn’t help if you can’t get the thing running (insert car analogy about a supercharged engine with no diagrams for putting it into an actual car).
In the office, we suffer from documentation issues because nobody wants to write it themselves and nobody can agree on a uniform way of doing it. We ended up with multiple wikis all over the place none of which are complete. Some small teams have design documents sitting in source control or in eRoom. We have requirements sitting in various Word documents and designs in Powerpoint presentations. There’s not even a tutorial telling new employees where we store our bugs, requirements, designs, or wikis or how new bugs should be filed, new requirements introduced or how wikis should be updated. In general, we have fragmented, tribal knowledge where nobody knows the whole story.
Making it better
At some point, somebody needs to lay down the law and start creating tutorials, walkthroughs and other documentation for a project. I am only one person in a huge division of an even larger corporation, but I already have a reputation for writing documentation. Django has their benevolent dictators declaring that all patches shall come with documentation. The Ubuntu community (and the Redhat and Mandrake guys before them) has taken Linux from having an arcane install process to being as easy as Windows — and it really shows.
Using Django for Intranet Applications
One thing I end up doing quite a bit of at work is developing custom web applications that sit on the company intranet. These aren’t your basic timecard application or company web portal. Instead, these customized applications do everything from reports and dashboards from our bug tracking software to providing a web service API to our test case management solution. This post is about chronicling some of our forays into intranet apps.
Every intranet app needs authentication
If your company is anything like mine, you have a huge Active Directory system and your webservers are protected by single-sign-on (SSO). Django plugs into that nicely with the RemoteUserMiddleware so that users don’t have to remember Yet Another Password for your app. My password to our off the shelf commercial bug tracking software is still “welcome” and my password to our “Agile” project and task tracker is “test”. I’m already on the corporate intranet. Why should I have to authenticate twice? With Django and RemoteUserMiddleware, your users are automatically logged in if they’re authenticated. It seems relatively trivial, but it greatly enhances the user experience to not have to remember the password to the /admin site.
Who needs /admin
It seems that virtually every application has some sort of need for admin functionality. We used to deploy phpMyAdmin on most of our web hosts to administer our content. It was dove hunting with a bazooka. There was little fine grained control and we ran into the same issue of re-authenticating to access the admin site. Now an admin site is not unique to Django and it exists in virtually every major web framework in every major language. This point can be taken as an overwhelming endorsement of using some framework over using no framework at all. However, Django’s admin site is easy and very customizable in case you need to pretty up your admin site because it will be accessed by a wider audience.
Intranet apps have a way of becoming internet apps
You never know management decides some piece of software that was never intended to be used externally suddenly is a must have for some outside group and it needs to be internationalized (into Japanese?!). Suddenly, all the code has to go through open source compliance, export compliance, code scans, and due to licensing restrictions management doesn’t want to ship with MySQL. Unfortunately, that software was written in Java and PHP with no framework and quite a few open source libraries that we couldn’t ship. The transition was much more painful than if we would have just used Django from the start.
Merging and splitting apps
I’ve released and deployed a number of web applications, but the real nightmare comes when apps are merged together or one app is broken apart. This is where Python’s packaging and the Django concept of splitting your project into multiple individual apps really shines. I’ve run into this from a couple of different sides. I’ve had to take multiple PHP applications and merge them together into a single deployment. There was a huge mess with including common code and the solutions aren’t great. You’re either messing with include_dir in php.ini — a huge nightmare when managing multiple deployments with different includes, or you’re stuck modifying every include statement to pickup libraries from a common location. Splitting apps up runs into similar issues. Packaging separable components into different apps used by your project really is the way to go and Django works with this very well.
I’ve developed web apps in Java, Perl and PHP and by far I’ve been happiest with Python and Django. There are other great frameworks out there for these languages and using them could definitely help alleviate some of these issues, but the Django solution fits together better than anything I’ve seen from Struts, CodeIgniter or one of the dozens of other frameworks out there. For me, there has to be a pretty compelling argument not to do new apps with Django.
