git/bzr historical performance comparison

OK, I know git vs. bzr has been beat to death and that bzr speed seems to be often cited as its “Achilles’ heel“, but I was in #bzr the other day and somebody (a git fan I take it) said something to the effect of “well, bzr couldn’t be used to work with the linux kernel tree, that’s what git was made for”. Now, I have no experience of working on the linux tree, but it got me to thinking about if anybody had done any benchmarking of that kind of operation.

After some googling I found an old blog post from 2006 by Jo Vermeulen where he did some basic timing of common tasks such as adding files, doing diffs, commits, and finding repo status on the Linux 2.6 kernel tree using both git and bzr. Since both git and bzr have come a long ways since 2006  I thought I’d replicate Jo’s comparison (with git 0.99.9c and bzr 0.7pre) using current (by Ubuntu 8.04 standards anyway) versions of git (1.5.4.3)  and bzr (1.3.1). So, here’s the results:

First we unpack a Linux 2.6.0 tarball into linux-bzr and linux.git directories, then initialize the repos:

Initialization:
git (old)       bzr (old)       git (new)       bzr (new)
0m0.161s    0m1.593s     0m0.086s     0m0.334s

Nothing exciting so far. Now we tell the VCSs to track the files via bzr/git add :

Adding files:
git (old)       bzr (old)       git (new)       bzr (new)
0m42.121s  0m31.870s   0m14.269s    0m4.852s

In this case bzr not only wins in terms of absolute speed, but also in proportional gains with time. The git:bzr ratio in 2006 was 1.32:1 and now it’s 2.93:1 . Jo didn’t mention in his comparison how long it took him to then commit the initial 2.6.0 tree we added but for me it was 0m10.263s for git and 0m43.968s for bzr, a pretty clear win for git.

Next we’ll untar the latest 2.6.x kernel into our repos. Jo used linux-2.6.15.4 and I used linux-2.6.25.2. Perhaps I should have used the same version he did but considering we’re using entirely different hardware I don’t think our results are directly comparable anyway. OK, so now we want to see how long it takes to diff the changes:

Diffing changes:
git (old)       bzr (old)       git (new)       bzr (new)
2m26.982s  1m13.869s   0m24.425s    0m51.158s

This is one of the more fascinating results in my little experiment. The 2006 results gave a git:bzr ratio of 1.99 whereas my new results give a ratio of 0.48 . Apparently git has done a lot of work on speeding up diffing.

Next we commit our new 2.6.x changes:

Committing large changes:
git (old)       bzr (old)       git (new)       bzr (new)
0m54.964s  2m4.757s     0m28.468s    1m8.627s

so an old ratio of 0.44 and a new ratio of 0.41: not a lot going on there.

A really interesting test that Jo did was to do a bzr/git diff right after committing. Ideally this would take no time at all as we haven’t done anything since the commit, however:

Diffing no changes:
git (old)       bzr (old)       git (new)       bzr (new)
0m0.057s    3m51.918s   0m0.343s      0m47.448s

Back when Jo did his experiment the git:bzr ratio was 0.00025! Ouch. My results gave a ratio of 0.0072. In this case bzr has been gaining a lot of ground but it’s still rather remarkable how long it takes to diff when there are no changes.

The other things we would often do is a bzr/git status to see what’s going on:

Getting repo status:
git (old)       bzr (old)       git (new)       bzr (new)
0m0.442s    0m19.711s   0m1.230s      0m4.027s

The original git:bzr ratio was 0.022 and for the new one 0.305 so bzr has gained by an order of magnitude but still lags a bit.

Lastly, we look at what happens if you make a minor change (let’s just add our name to MAINTAINERS for fun) and then commit:

Small commit:
git (old)       bzr (old)       git (new)       bzr (new)
0m7.364s    2m6.685s     0m0.397s     0m9.010s

The times I got for both git and bzr are significantly faster than what Jo got in 2006. His git:bzr ratio was 0.058 and mine is 0.044, so some marginal gain by bzr here.

A last interesting note of comparison is the storage size that the VCS takes up. After all the operations above my .bzr directory is 112MB (or 23% of the total size of the repo+working tree) and the .git directory is 162MB (or 30% of the total size) so it seems that bzr has a bit better storage compression.

OK, so now the question is, what does it all mean? Well, I’m not entirely sure to be honest. When it comes to my original question of “Would bzr be usable working on the Linux tree” I would think, at least when it comes to common local operations, that the answer would  definitely be yes. It’s not the fastest thing around but it’ll get the job done.

I use both git and bzr on a regular basis and both are exciting and have their own strengths and weaknesses. Git is no doubt very fast, though I think other DVCSs are starting to catch up. Bzr is very user friendly and has great plugins. It’s really a cool time for code sharing, in my opinion. Rock on!

22 thoughts on “git/bzr historical performance comparison

  1. The issue is not only speed. It’s features and the methods that the majority of developers on large scale projects need.

    I’m sure most modern VCSs could handle the kernel development. Can the developers make the VCSs work for them, that is the question.

    In the end the best tool for the project is the one that the contributors can use and the leaders demand.

  2. What about large merges? This is the most important reason for git’s (and bzr I guess, though I don’t know as much about it) existence, so I’m curious as to how they compare there. It’s also probably the most computationally intense thing that this sort of software really has to do, and once you get something with the history and size of the kernel is where you should really start to see the differences.

  3. The “git fan” in #ubuntu-kernel you spoke of was me. Good comparison and thanks for the work. For something like kernel development where you do a lot of small changes in several small files, bzr is still really slow.

    Now can you please show us the source tree you did this on so that these tests are reproducable?

    Thanks again.

  4. sejeff:
    I realize not a complete or comprehensive comparison but I’m not a VCS expert either. In particular I wonder how well bzr would do with merges as David has commented on above as well as remote work. In my experience, pushing and pulling remotely is where bzr’s speed becomes a problem.

    The source trees I used were:
    http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.0.tar.bz2
    and
    http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.25.2.tar.bz2

  5. @sejeff

    >For something like kernel development where you do a lot of small changes in several small files, bzr is still really slow.

    It seems that in most cases, the new bzr is faster than the old git. So either the old git was unusable for the kernel, or you want to rephrase that sentence.

    I actually think all these times are very acceptable. I don’t really see how making a very uncommon operation one second faster would change the status from ‘nice’ to ‘still really slow’.

    The most interesting point, the only real point of value here, is that all of the timings suggested a O(n) or better.

    _That_ is the discussion. They both scale.

  6. @David:
    Great question about git gc. When I run it it takes the .git/ directory down to 92MB! I believe the corresponding bzr command is bzr pack. After doing that I still ended up with a 112MB .bzr/ . I’ve noticed that bzr seems to be better about keeping things maintained than git in this respect. You have to remember to periodically run git gc, but when you do it’s pretty amazing the difference.

  7. re bzr pack:
    bzr pack forces a complete repack; at the end of this the old packs are in .bzr/repository/obsolete-packs/. (so your repository is roughly twice its normal size). This directory is purged before every pack operation (automatic and forced); most pack operations (exponential backoff – so 10:1 roughly) only pack 10 revisions at a time, which means the typical overhead of the obsolete packs directory is tiny. That directory is not automatically cleaned to allow recovery in the event of NFS races or similar issues causing the new packs to not be fully flushed to disk.

  8. Could you also run the test for Mercurial? It would be interesting to see its performance, as it is often said that hg is faster than bzr.

  9. Pingback: bzr, git, and hg performance on the Linux tree « LaserJock

  10. @laserjock:
    >I’ve noticed that bzr seems to be better about keeping things maintained than git in this respect. You have to remember to periodically run git gc, but when you do it’s pretty amazing the difference.

    git 1.5 and newer will run gc for you periodically. You just hadn’t had the repo around long enough for one of these automatic runs to clean things up for you.

  11. @Andrew
    I was comparing to bzr and git versions from 2 years ago and I wanted to compare to versions people would actually be using.

    @Philipp
    I did a bit of work on reproducing my results but running sync between each step. I also rotated the order in which I did bzr, git, and hg with each step to see if I could spot and trouble there. The results were pretty much the same except for bzr was just as fast as the other DVCSs when diff’ing a repo that hadn’t changed.

  12. Given your aim of assessing progress, using “whatever is in” a Debian based distro seems a bit strange; they’re never up to date. In Bazaar’s case, you used something that is two versions behind the current release.

    AfC

  13. It seems that Bazaar people tend to do benchmarks in repositories with just an imported tarball. Benchmarks which use long-history repositories and test history operations show that currently Bazaar does not scale at all. It can take a minute to get something out from “bzr log”, for example.

    As far as I know it’s currently very difficult to import Linux kernel repository (i.e. the whole history) to Bazaar because “bzr fast-import” crashes in trying.

  14. When you have made initial commit, how did you add all files? Did you use “git add .”, or did you use much slower “git add *” (although that got improved a bit in git some time ago)

  15. Pingback: In Traction » Blog Archive » Bzr vs git, the sequel

  16. No… bzr is deadly slow. Git focus was always speed, because Linus knew how crucial is to the developer patience to, a task that needs to be done multiple times per day, get done in half a second instead of several. So, don’t say bzr would get the job done for the kernel, you embarass yourself 😉

    Other than that, bzr is very cool, it’s easy to use, and is hackeable via Python “plugins”. I’m using it for all my projects since 1.5.

  17. Pingback: Bazaar: Version Control System « Rudimentary Art of Programming & Development

  18. Well. Somebody here says that bzr sucks in long-range projects with huge history.

    Let’s see: we have started using bzr 1 year ago on our project. That was pre1.0 :). Now we have ~5000 unique revisions and ~800 in trunk branch. I cant say that bzr is now slower comparing to empty repository. Speed is still the same.

    We were changing storage from pack-0.92 -> knit -> rich-root-pack (ohh.. that was not easy conversion) -> 1.6.1-rich-root -> 1.9-rich-root. Last conversion makes good perfomance gain (possibly because of b-tree indexes). Anyway – no data loss for whole year ;).

    Take a look:
    (ipidev) mocksoul@mocksoul-home ~root % time bzr log –line | wc -l
    783
    bzr log –line 0.46s user 0.03s system 96% cpu 0.514 total
    wc -l 0.00s user 0.00s system 0% cpu 0.471 total

    (AMD Athlon 3800+ single core)

  19. Pingback: Anonymous

Leave a comment