Perhaps it is a platitude but it’s true that good documentation makes a project successful. Recently, Jacob Kaplan-Moss of Django fame started a series of articles on documentation — perhaps one of the most underrated aspects of a software. Although I may not agree with everything (technical documentation following MLA?! Just couldn’t get away from that lit degree huh Jacob?), the series as a whole is truly readable and he puts words to some of my own vague thoughts on documentation. With all that in mind, I want to focus on what led to Django’s documentation and some concrete examples of successful and not so successful documentation.
The evolution of software documentation
Back in the dark ages, all documentation was done by hand. Considering that developers are pretty bad about documentation now, I don’t even want to imagine what it was like back then. When I first got on the programming scene, Javadoc was a pretty new development. Javadoc was great in that it allowed you to write documentation only once and at the same time you wrote the code. Then, you used your code files to generate pretty HTML. This was amazing for API reference documentation and I still think that Java has the most well documented standard library API of any programming language. After seeing that you could auto-generate documentation, a host of clones like POD and Doxygen followed. Unfortunately, too many people decided that having Javadoc means that your project is documented (I’m looking at you SNMP4J!).
The next stage in the evolution came from PHP’s excellent documentation of all places. The PHP folks started including formatted examples, suggestions and caveats with their reference API. They also started putting together guides about particular aspects of the language such as security or working with Oracle. On top of this, they had the seemingly ingenious idea of allowing users to post comments on the documentation. At first, this worked well. Years ago, I posted a couple steps I took to get PHP working with Oracle which is still a pain to this day. However, over time, the comments got clogged with snippets of useless code, cries for help and other drivel. It’s no small wonder that Django removed the ability to comment on their documentation.
Django embodies the current incarnation of documentation and although I think it is the best documented project I’ve worked with, I’m not convinced it is ideal. Django combines the PHP-style guides with tutorials. This works exceptionally well for getting new users off the ground quickly and bringing users familiar with the project up to speed in a new area. However, I feel that Django lacks somewhat in the reference API area and part of this has to do with the fact that they are writing all of the documentation by hand instead of generating it from docstrings. The method references are usually an afterthought (because they don’t happen at the same time the code is written) and don’t contain the level of detail that the PHP or Java documentation does. Without looking at the actual code, how else would you figure out that the markdown filter can accept extra parameters.
A success story
Despite the trash I just talked about Django (blasphemy!), it is a success and the documentation is superb. It gets you up and running very quickly and has detailed documentation on virtually every aspect of the project. I attempt to model both my open source projects’ and my work projects’ documentation after Django’s — imitation is definitely a form of flattery. Django has maintained its documentation by enforcing some rules and good practices. The Django project maintains all the reST formatted documentation files in the code line and requires that patches include updates to the relevant documentation. This ensures that the docs are up-to-date with the code — a problem lots of projects suffer from. Django uses Sphinx to generate their documentation periodically from the code line — I don’t know how often, but fairly often — and make it available as the official Django documentation. Inaccuracies and problems are caught quickly.
What not to do
Early Linux suffered from some serious documentation fail. I remember being familiar with installing Windows and Mac system 8-9 from scratch and figured that I could install Linux. Perhaps I made the mistake of trying Slackware, but I can remember even after having a mostly successful install having to compile all sorts of packages from scratch. It was a great hobby for CS student back in 2001, but hardly a well documented and easy to use project. The fact that the code is great and the system is stable doesn’t help if you can’t get the thing running (insert car analogy about a supercharged engine with no diagrams for putting it into an actual car).
In the office, we suffer from documentation issues because nobody wants to write it themselves and nobody can agree on a uniform way of doing it. We ended up with multiple wikis all over the place none of which are complete. Some small teams have design documents sitting in source control or in eRoom. We have requirements sitting in various Word documents and designs in Powerpoint presentations. There’s not even a tutorial telling new employees where we store our bugs, requirements, designs, or wikis or how new bugs should be filed, new requirements introduced or how wikis should be updated. In general, we have fragmented, tribal knowledge where nobody knows the whole story.
Making it better
At some point, somebody needs to lay down the law and start creating tutorials, walkthroughs and other documentation for a project. I am only one person in a huge division of an even larger corporation, but I already have a reputation for writing documentation. Django has their benevolent dictators declaring that all patches shall come with documentation. The Ubuntu community (and the Redhat and Mandrake guys before them) has taken Linux from having an arcane install process to being as easy as Windows — and it really shows.
As you may be aware, I released a Django application, RPC4Django, publicly a little while ago. This is what I learned.
Make It Easy
You efforts should be targeted at making adoption of your software as easy as possible. When developers look for a package that accomplishes a goal, they aren’t going to spend much time looking at any individual package. Usually a quick google search or a search in pypi and then they might spend a few minutes glancing at a package before moving on to the next one. There are a couple keys to making it easy:
- Have easy documentation on installation, configuration and the license.
This should be on both your webpage as well as on pypi. More on this later.
- Give demo code or better yet, an actual demo.
Quite often the one sentence summary on pypi (if you don’t provide a long description) is not sufficient to tell people why they should use your package.
- List the package’s compatiblity.
Considering very few packages are compatible between 2.5 and 3.0 or even 2.3 and 2.5, I’m very surprised by this. Nothing is more frustrating than thinking you’ve found what you need just to have installation (or worse, execution) fail.
Test Early and Test Often
When I released RPC4Django, it had a reasonable, but not great, unit tests with it. With the future releases, those improved significantly and a few more bugs were caught. Being able to quickly test python 2.4, 2.5 and 2.6 by having a fully built unit test suite was awesome. However, unit tests don’t catch everything. One bug that I did not catch for a while was a bug relating to how Django worked with mod_wsgi on Apache. No amount of unit testing or Django views testing (which is sort of a hybrid of unit and integration testing) would have caught this. It only got caught when the code was pushed to a production server.
Documentation usually requires keeping data in a number of places. Any project of considerable size has READMEs, a license file, HTML documentation, tutorials, a setup.py long description, class and module documentation, and more. There should be as few authoritative places for package information as possible. In the first version of RPC4Django, I rolled my own HTML documentation and and README basically told the user to read the real documentation. This solved the whole problem of duplicated documentation, but in a rather unenlightened way.
.. include:: LICENSE.txt
What this allows for and what RPC4Django does now is to enable one document to be built of many documents. In this way, I keep the license in one file, the installation in another, the changelog in another and they all are included into my README which can be used to generate my HTML documentation.
If my project was even larger, I might do what both the Python project and the Django project do. They include reST documents in their source tree. These include walkthroughs, tutorials, introductions to classes and more. Sphinx, a python package built to document the python library itself, can create attractive documentation hierarchies directly from simple text documents (which are themselves human readable!). Python uses it for all of their documentation and the Django project uses it for basically the entire documentation section of their webpage — including tutorials. You wouldn’t know it since the pages are pretty attractive but they appear to be generated directly out of subversion on a nightly basis. This keeps all the documentation together and makes sure your website, HTML documentation and source documentation stay in sync.
Don’t Get Discouraged
Just because your package is now on pypi doesn’t mean people are going to flock to it and download it. I think I’ve put together a pretty solid and useful package but I have only a few downloaders and I haven’t gotten much feedback. I intend to power through and start a new project.