Github Timeline and Social Coding

The Github public timeline is up on bigquery and I decided I’d play around with it. I created this visualization which is a first (alright, like eighth) try at measuring “how social” a project really is. The colors correspond to different programming languages and the size of the arc is based on the number of distinct collaborators on a project.

Other attempts

I thought about looking only at Pull Requests. It does uncover some interesting projects which have a lot of pull requests. I think this penalizes projects like Linux which doesn’t really have pull requests. Also, I wasn’t sure code submissions alone were exactly what I was looking for. I also briefly looked at only merged pull requests. I ended up filtering out projects with no stated language partially because there were a number of projects that just had common names (eg. “test).

While doing this, I figured I would have heard of the most social projects, but that there would be a lot of projects I’d never heard of for a variety of reasons. Some projects can get a lot of watchers, forks, comments or issues from an entirely separate group of people from the people I follow.

Most social projects by language

The visualization has the full dataset, but here’s a taste of the data:

  • C – php-src, linux, mruby
  • C++ – mosh, mysql, fr_public
  • JavaScript – bootstrap, meteor, jquery-file-upload
  • PHP – symfony, codeigniter, foundation
  • Python – django, legit, flask
  • Ruby – sample_app, rails, first_app