Getting started with pygit2

I should preface this post by saying that I’m not a Git expert so this is based on my experimentation rather than any deep knowledge. I bet I’m not the only one who merely skimmed the internals chapter in Pro Git. This post is the result of me setting out to learn the Git internals a little better and help anybody else who is trying to use pygit2 for something awesome. With that said, corrections are welcome.


While this is obvious to some, I think it’s worth pointing out that pygit2 versions track libgit2 versions. If you have libgit2 v0.18.0 installed, then you need to use pygit2 v0.18.x. Compiler errors will flow if you don’t. The docs don’t exactly mention this (pull request coming). Other than that, just follow the docs and you should be set. On MacOS, you can brew install libgit2 (make sure you brew update first) to get the latest libgit2 followed by pip install pygit2.

The repository

The first class almost any user of pygit2 will interact with is Repository. I’ll be traversing and introspecting the Twitter bootstrap repository in my examples.

There’s quite a bit more to the repository object and I’ll show it after I introduce some other git primitives and terminology.

Git objects

There are four fundamental git objects — commits, tags, blobs and trees — which reference snapshots of the git working directory (commits), potentially annotated named references to commits (tags), chunks of data (blobs) and an organization of blobs or directory structure (trees). I’ll save blobs and trees for a later post, but here’s some examples of using commits and tags in pygit2.

Since commits and tags are user facing, most git users should be familiar with them. Essentially commits point to a version of the working copy of the repository in time. They also have various interesting bits of metadata.

One tip/issue with using the repository bracket notation (repo[hex | oid]) is that the key MUST be either a unicode string if specifying the object hash or if it is a byte string pygit2 assumes that it points to the binary version of the hash called the oid.

Tags are essentially named pointers to commits but they can contain additional metadata.

You can read all about the types of parameters that revparse_single handles at man gitrevisions or in the Git documentation under specifying revisions.

Typically, you won’t need to ever convert between hex encoded hashes and oids, but in case you do the the conversion is trivial:

Walking commits

The Repository object makes available a walk method for iterating over commits. This script walks the commit log and writes it out to JSON.

Dump repository objects

This script dumps all tags and commits in a repository to JSON. It shows how repositories are iterable and sort of puts the whole tutorial together.

  • There’s talk of changing the pygit2 API to be more Pythonic. I was using v0.18.x for this and significant things may change in the future.
  • It helps to think of a Git repository as a tree (or directed acyclic graph if you’re into that sort of thing) where the root is the latest commit. I used to think about version control where the first commit is the root, but instead it is a leaf!
  • If your repository uses annotated or signed tags, there will be longer messages or PGP signatures in the tag message.
  • I’ve glossed over huge chunks of pygit2 — pretty much anything that writes to the repository — but if I don’t leave something for later my loyal readers won’t come back to read more. =)

2 thoughts on “Getting started with pygit2

  1. The interesting thing about pygit2 as opposed to GitPython is that pygit2 doesn’t shell out to the git command line but instead relies on a C library (libgit2). Also, GitPython seems somewhat stagnant although I don’t know how much Git has changed over that time and so maybe it doesn’t need to change. Thanks for the tip on legit.

Comments are closed.