bzr, git, and hg performance on the Linux tree

OK, so I just did a historical comparison of git and bzr performance using the Linux source tree. One of the comments I got was “what about Mercurial?” Fair enough. I’ve really never done much with Mercurial because Ubuntu primarily uses bzr and git is what most of the other people I know using a DVCS use. However, there are a lot of projects using Mercurial, Mozilla being probably the most notable one. So, here’s a comparison of bzr and hg. You may want to read my previous post for details on the steps I’m doing.

Repo Initialization:
git                bzr                hg
0m0.086s     0m0.334s     0m0.137s
1        :         3.88      :      1.59

Add 2.6.0 Linux tree:
git                bzr                hg
0m14.269s   0m4.852s      0m2.526s
5.65      :      1.92      :       1

Commit 2.6.0 Linux tree:
git                bzr                 hg
0m10.263s   0m43.968s    0m30.890s
1         :        4.28       :       3.01

Diff after copying in 2.6.25.2 Linux tree:
git                bzr                hg
0m24.425s   0m51.158s    0m37.846s
1        :         2.09      :      1.55

Committing large changes:
git                bzr                hg
0m28.468s   1m8.627s     0m47.948s
1        :         2.41      :        1.68

Diff after no changes:
git                bzr                hg
0m0.343s     0m47.448s    0m1.340s
1         :        138       :       3.91

Getting repo status after no changes:
git                bzr                hg
0m1.230s     0m4.027s     0m1.077s
1.14       :      3.74      :     1

Committing a trivial change:
git                bzr                hg
0m0.397s    0m9.010s      0m1.913s
1        :        22.7       :       4.82

Repository size (just VCS control directory):
git (gc)        bzr (pack)      hg
92 MB         112 MB          179 MB

So, Mercurial performs quite well. It generally sits somewhere between git and bzr. Hg runs somewhere around 2.75 times slower than git in the tested operations. Bzr runs around 5 times slower with the notable exception that bzr diff when there are no changes is 138 times slower than git and 35 times slower than Hg.

git/bzr historical performance comparison

OK, I know git vs. bzr has been beat to death and that bzr speed seems to be often cited as its “Achilles’ heel“, but I was in #bzr the other day and somebody (a git fan I take it) said something to the effect of “well, bzr couldn’t be used to work with the linux kernel tree, that’s what git was made for”. Now, I have no experience of working on the linux tree, but it got me to thinking about if anybody had done any benchmarking of that kind of operation.

After some googling I found an old blog post from 2006 by Jo Vermeulen where he did some basic timing of common tasks such as adding files, doing diffs, commits, and finding repo status on the Linux 2.6 kernel tree using both git and bzr. Since both git and bzr have come a long ways since 2006  I thought I’d replicate Jo’s comparison (with git 0.99.9c and bzr 0.7pre) using current (by Ubuntu 8.04 standards anyway) versions of git (1.5.4.3)  and bzr (1.3.1). So, here’s the results:

First we unpack a Linux 2.6.0 tarball into linux-bzr and linux.git directories, then initialize the repos:

Initialization:
git (old)       bzr (old)       git (new)       bzr (new)
0m0.161s    0m1.593s     0m0.086s     0m0.334s

Nothing exciting so far. Now we tell the VCSs to track the files via bzr/git add :

Adding files:
git (old)       bzr (old)       git (new)       bzr (new)
0m42.121s  0m31.870s   0m14.269s    0m4.852s

In this case bzr not only wins in terms of absolute speed, but also in proportional gains with time. The git:bzr ratio in 2006 was 1.32:1 and now it’s 2.93:1 . Jo didn’t mention in his comparison how long it took him to then commit the initial 2.6.0 tree we added but for me it was 0m10.263s for git and 0m43.968s for bzr, a pretty clear win for git.

Next we’ll untar the latest 2.6.x kernel into our repos. Jo used linux-2.6.15.4 and I used linux-2.6.25.2. Perhaps I should have used the same version he did but considering we’re using entirely different hardware I don’t think our results are directly comparable anyway. OK, so now we want to see how long it takes to diff the changes:

Diffing changes:
git (old)       bzr (old)       git (new)       bzr (new)
2m26.982s  1m13.869s   0m24.425s    0m51.158s

This is one of the more fascinating results in my little experiment. The 2006 results gave a git:bzr ratio of 1.99 whereas my new results give a ratio of 0.48 . Apparently git has done a lot of work on speeding up diffing.

Next we commit our new 2.6.x changes:

Committing large changes:
git (old)       bzr (old)       git (new)       bzr (new)
0m54.964s  2m4.757s     0m28.468s    1m8.627s

so an old ratio of 0.44 and a new ratio of 0.41: not a lot going on there.

A really interesting test that Jo did was to do a bzr/git diff right after committing. Ideally this would take no time at all as we haven’t done anything since the commit, however:

Diffing no changes:
git (old)       bzr (old)       git (new)       bzr (new)
0m0.057s    3m51.918s   0m0.343s      0m47.448s

Back when Jo did his experiment the git:bzr ratio was 0.00025! Ouch. My results gave a ratio of 0.0072. In this case bzr has been gaining a lot of ground but it’s still rather remarkable how long it takes to diff when there are no changes.

The other things we would often do is a bzr/git status to see what’s going on:

Getting repo status:
git (old)       bzr (old)       git (new)       bzr (new)
0m0.442s    0m19.711s   0m1.230s      0m4.027s

The original git:bzr ratio was 0.022 and for the new one 0.305 so bzr has gained by an order of magnitude but still lags a bit.

Lastly, we look at what happens if you make a minor change (let’s just add our name to MAINTAINERS for fun) and then commit:

Small commit:
git (old)       bzr (old)       git (new)       bzr (new)
0m7.364s    2m6.685s     0m0.397s     0m9.010s

The times I got for both git and bzr are significantly faster than what Jo got in 2006. His git:bzr ratio was 0.058 and mine is 0.044, so some marginal gain by bzr here.

A last interesting note of comparison is the storage size that the VCS takes up. After all the operations above my .bzr directory is 112MB (or 23% of the total size of the repo+working tree) and the .git directory is 162MB (or 30% of the total size) so it seems that bzr has a bit better storage compression.

OK, so now the question is, what does it all mean? Well, I’m not entirely sure to be honest. When it comes to my original question of “Would bzr be usable working on the Linux tree” I would think, at least when it comes to common local operations, that the answer would  definitely be yes. It’s not the fastest thing around but it’ll get the job done.

I use both git and bzr on a regular basis and both are exciting and have their own strengths and weaknesses. Git is no doubt very fast, though I think other DVCSs are starting to catch up. Bzr is very user friendly and has great plugins. It’s really a cool time for code sharing, in my opinion. Rock on!