code, code.back, code.back2… – A better way with Revison Control (svn/git/bzr/hg tutorials & comparisons)

18 06 2007

This article is very long, it covers some basics of what revision control system (RCS)/ source code management systems (SCMS) are, basic tutorial of using subversion for a personal repository, what distributed ones are, basics of using git, bzr and hg for a personal repository and my comparisons on them. Its only a basic introduction, I’ve never had to manage any large complex projects so advanced stuff isn’t covered (plus its long enough).

If you program and don’t use some kind of rcs you are making your life much harder than it should be, rcs are a great, distributed ones are greatest. All you need is to learn a few steps to setup a repo, and somewhere to put it, anything with ssh can be used or just on the local disk.

Even for non-programmers, if you find yourself making changes to config files much then having a repository containing them is definitely a good idea, if you botch it up, you can always revert to the previous edit and compare the 2 with diff.

    Introduction to RCS

Originally when I would code, I would intermittently ‘cp -rf directory directory.backup’, that way if I screwed up my code I could always go back. This was working fine for my smaller projects, at least until one particularly painful Uni assignment (SunRPC will segfault on anything), eventually I had reached backup.22 and often I had to go back a few revisions, Not an easy task because I wouldn’t remember the exact number, and I had often done more than one change to the code like add comments to everything, which resulted in me creating more backups with things like a single comment added, because the code I had done had started to randomly segfault. I’m sure there was a simple memory leak but with the deadline a few hours away I didn’t have time to hunt it down (basic gdb wasn’t working because it was the SunRPC libs that where crashing). In the end I got my assignment in (although it was probably the worst mark on an assignment yet, once again I hate RPC).

After that I decided to try using a revision control system, previously I had never actually though about using them for my simple coding and just assumed they where only needed for larger project, the only time I had encountered them was to ocassionaly grab some code from when I needed something newer than was shipping with my Linux distribution, however while googling for stuff about uni I managed to find this website from another student about setting up SVN for projects. I had only previously used svn for grabbing code from public projects, I had also used cvs although it was fairly clear than cvs was a fairly outdated system.

    SVN – Subversion tutoiral

SVN works rather well for me as it is on the systems at uni and can be tunneled over ssh so I can push/pull to/from my server at home. The basic functionality of svn allows for going back to any previous revision with ‘-r #’, coding from any system that can connect to the repository all I need to do is checkout/update it, seeing a ‘diff’ between revisions to see what I changed.

Unfortunately subversion isn’t distributed (explained later) so I wouldn’t recommend it, but understanding the basics of revision control is important, so I have instructions on using it here, the same basic outline of commands is used for most of the revision control systems around with a few minor differences. I might use svn for basic repository for editing config files but any of the distributed ones would work just as well.

You can use any system you either have direct access to or ssh (and http etc…) to store your code, I’m using ssh in this.

SourceForge (from the owners of Slashdot) provide free public svn (and cvs) hosting for open source projects, include bug tracking and basic forums however I haven’t found the site very nice to navigate, although you can just host a normal website on it.

Setting up a personal local repository is easy.

Firstly we need to make a svn folder where all the other svn projects will live:
mkdir ~/svn

Then we need to make a repository for the project:
svnadmin create ~/svn/PROJECTNAME

Next is importing the current code, you do this from the directory where your code lives not the svn created one, make sure you clean up any unneeded files like binaries and generated output first:
svn import . \
file:///home/USERNAME/svn/projectname/trunk -m "Initial commit."

Notice that it is going into the sub folder trunk, this is important because later on you might need to tag code so you might end up with /trunk/, /1.0rc1/ and /1.0/, you can just put the code in the main directory if you don’t want this kind of functionality. Make sure there are 3 /’s in the uri, normally the server name goes after the first / but since this is local there aren’t any. You must also specify the full path to your folder. -m is the commit message that describes the changes for revisions.

You can also use svn+ssh://USERNAME@SERVER/home/USERNAME/svn/PROJECTNAME/trunk if you want to do it over ssh.

The next set is to checkout your repository, even though you have a local copy you still need the subversion metadata (Annoying url prefix, I wish it was just ssh://):
svn checkout \

This time I’m doing it over ssh, once again remember that its coming from the trunk folder. The trailing PROJECTNAME is to make svn rename. co is a shorter alias of checkout if your excessively lazy.

Thats the hard bits done, from now on its very simple as all the information about where to upload is stored in the .svn folder in your project.
Now you just edit your code, and once your happy with the changes you type:
svn commit -m "Description of changes."

When you create a new file that you want to add to the repository you must first tell svn that you want to add it manually, this avoids accidentally uploading compiled binary files or files outputted by your program:
svn add filename

To update to the version of the code in the repository (or a particular version with -r#):
svn update

To see the difference between revisions, you can also specify a particular revision with -r:
svn diff

To see the logs:
svn log

To make a tag:
svn copy \

    Distributed Revision Control Systems

SVN was a massive improvement to managing even simple personal code, I used it for several months without issues, however there is a new bread of RCS that are appearing, distributed ones. There are currently 3 main contenders:

  • Git – Made by Linus for maintaining the Linux kernel, also used by KDE.
  • Mercurial – Run with the command ‘hg’, A popular Python based one, used by Sun (for hosting of Java).
  • Bazaar-NG – Or ‘bzr’ Python again, as used on and the Ubuntu community (there all Canonical made).
  • There is also darcs (written in Haskell), GNU Arch, and monotone. There is a Wikipedia article listing various revision control software (commerical/free and central/distributed).

    Being distributed means that when you check out a central repository, you actually have your own local repository rather than just a copy of the code from it, so you can commit changes without having access to the central repository. Allows for much easier experimentation as you can quickly branch off from your local repository and its Useful for people with laptops who might not have an internet connection. With subversion, you can checkout a repository but then your stuck with the one version, you can only commit back to the main repository, the most you could do is try to copy the directory and other painful workarounds. Also there isn’t technically a ‘central’ repository, although there will generally an official one everyone downloads from. Still handy features to have even when its just for personal use, for instance a simple ‘svn log’ needs to talk to the central server, which can take some time if its a large repo and/or is over a slow connection.

    Speed wise Git is currently the fastest for most operations as it was designed for maintaining the massive Linux kernel. Next fastest is Mercurial and then Bazaar (which is planning to match git speeds in its 1.0 release). However for most simple projects speed isn’t that much of a requirement, as long as its not tediously slow for simple changes any of them should work fine.

    The functionality of all of these are fairly similar, you tell it who you are, you init the original source directory, commit the initial repository, then you can checkout from anywhere with access, branch off code, modify code, commit it, merge it back into the master branch, push it to the server. Review Logs, see changes with diffs etc…

    Most of these support the ability of checking out code from repositories of a different type, you might need a plugin though. You can also convert between systems with tailor, although you might loose some information.

    In the end its probably just personal choice which one you prefer as they all offer the same basic functionality.

      DRCS Tutorials


    Firstly there is a great talk from Linus about Git on Google video, its 1hr 10min long. It might be somewhat dated however, some of the functionality talked about might have been implemented or speedup since then (for instance pushing in git now exists).

    Git is written in c and is currently the fastest. It is probably best suited for larger projects. However some of Git is more advanced features are a bit harder to use and understand although not by too much for basic usage, so it might not be suited for the less experienced user. The speed improvements on Git are apparently lost on Windows systems as they rely on some specific methods of disk access (unless this has been fixed in newer versions). So Windows or multisystem developers might want to avoid it.

    If you want, you can get free public Git hosting here, although its only a very basic service currently.
    UPDATE: There is also github which has a free opensource developer plan (100mb, no private repos).

    A nice thing about Git is that it keeps all your branches in the same folder, with bzr/hg when you branch of it creates a separate folder for that branch, you could keep them all in one main project folder (For bazaar you can create a repository that stores all your branches saving space by sharing common files) but with Git everything is in the one folder by default making for a much tidier feel, branches you aren’t working on are tucked away and you switch between them fairly painlessly with the checkout command. Might require a bit more effort to work between 2 branches however.

    Git also has nice sha-1 ids for everything so you can tell if things become corrupt, and it generally views all your code as one thing rather than each file so it can track changes to a function even if its moved from one file to another.

    You can ‘apt-get install git-core’ on Ubuntu/Debian, however its out of date so the instructions will vary. You can get the code from the site compile from source for a newer version.

    Firstly tell Git who you are (and enable prwdy colours), the following for newer version of Git:
    git config --global "YOURNAME"
    git config --global
    git config --global color.diff auto
    git config --global color.status auto
    git config --global color.branch auto

    Note that those are all –config, not -confg, wordpress screws it up

    To initialize the current code directory (older versions use ‘git-init-db’):
    git init

    When committing to Git, you need to maintain an index of files that are to be committed, you can use the ‘add’ command to do this, in svn you only need to add new files to but in git you need to also add changed files, however rather than adding changed files manually you can use ‘commit -a’ which will automatically add the changed files to the index (but not newly created ones). Since all your files are new in this initial import you need to add them:
    git add .

    Then commit them:
    git commit

    When you want to grab your code from a remote repository and put in in the current directory, use:
    git clone ssh://SERVER/home/USERNAME/git/PROJECTNAME

    Enter your directory, you can then make a branch for hacking on:
    git branch BRANCHNAME

    View your list of branches:
    git branch

    Then you switch to that branch:
    git checkout BRANCHNAME

    Modify some code and check it into your local BRANCHNAME branch:
    git commit -a

    Switch back to your original local branch:
    git checkout master

    Merge the changes into the master branch:
    git merge BRANCHNAME

    Delete the extra branch (-D will force it to delete if you didn’t merge it):
    git branch -d BRANCHNAME

    Push the branch to your server:

    Theres some more tutorial information on Git here.

      Bazaar – (bzr)

    Bazaar written in python is probably the slowest of the 3, however the current project roadmap for 1.0 is to match the speed of git, so there might be some improvements appearing. There are benchmarks here showing much better speed improvements, up to 0.15, no 0.16/0.17 which also list more performance improvements in their changelogs. I haven’t found any videos on Bazaar but there have been three, shuttleworth, posts recently on bazaar as a lossless RCS.

    For public Bazaar hosting there is launchpad, which has bug tracking and such for project, and storing personal user branches.

    Bazaar seems fairly simple to use, I haven’t needed any of the more advanced features but it seems like advanced stuff would be simpler under Bazaar than Git, but for the simple stuff there isn’t any major difference.

    Firstly set your name:
    bzr whoami "Your Name <>"

    Enter your source code directory and initialize it:
    bzr init

    Add the files to the index:
    bzr add .

    Commit the branch. this same command is also used to commit code after its modified, by default it will add all changed files to the index, like -a in git:
    bzr commit

    You can create a repository to store branches, this allows you to save space by sharing the common files between them.
    bzr init-repo REPONAME

    Now you can branch off from your remote branch into the local repository, notice its sftp for ssh now, a different standard for the same thing again, you can use ~ for the home folder now though, there is also bzr+ssh:// which doesn’t seem to need the paramiko library but i’m not sure of the difference between them other than that:
    bzr clone sftp://USERNAME@SERVER/~/bzr/PROJECTNAME

    In addition to ‘clone’, you can also use ‘checkout’, this means that any changes you commit, as well as being committed to the local branch will also be committed to the branch you checkout from, if possible. This is somewhat similar to svn, except changes are still committed to the local branch regardless of the remote branch being accessible (unless you use –lightweight, in which case it works just like svn and all everything depends on the remote branch working). You can also use checkout inside a branch to obtain the latest committed version of that branch into the working directory which is sometimes needed if you push branches as it will transfer the .bzr directory with the revisions but not the working branch.

    You can fork of from your local branch for experimental coding, which will make a separate folder in the repository:
    bzr clone PROJECTNAME PROJECTNAME-testcode

    Then after coding, change to the main local branch directory and merge:
    bzr merge ../PROJECTNAME-testcode

    Then you can push the local branch back to your servers branch:
    bzr push sftp://USERNAME@SERVER/~/bzr/PROJECTNAME

    Also see the official Bazaar tutorial.

      Mercurial – (hg)

    Mercurial works basically the same as bazaar. Theres a google video tech talk on it here (50min).

    Thus you must firstly identify thyself:
    echo -e "[ui]\nusername = YOUR NAME <>" > ~/.hgrc

    Changeth to thy source code directory and initilizeth with:
    hg init

    Addeth ye new files to thy index:
    hg add

    Commiteth thy files to thy repo:
    hg commit

    Snag your remote repo to a local location:
    hg clone ssh://USERNAME@SERVER/~/hg/PROJECTNAME

    Branch off your local main to a secondary branch:
    hg clone PROJECTNAME PROJECTNAME-testcode

    Modify some code, and commit to the secondary branch with:
    hg commit

    Change back to your primary local branch and merge (this needs 2 commands):
    hg pull ../PROJECTNAME-testcode
    hg update

    Push it to your remote repo:
    hg push ssh://USERNAME@SERVER/~/hg/PROJECTNAME

    Official Mercurial Tutorial.


    After trying out all 3, I found them to be vary similar to each other and any would be suitable for most purposes, you could probally pick one at random and be happy or choose one based on the public services that are avilable such as launchpad, I will probably end up using bzr, hg seemed to make merging a bit more of a pain, requiring an extra step, and the ‘merge’ command some how changed from the docs to the ‘update’ command, also the aesthetics of the output wasn’t as good but thats a bit nitpicky. Bazaars rapidly improving speed should see it ahead of hg if they meet their goals. I also liked git quite alot and might use that for some stuff but it isn’t available on the Solaris systems at uni, and requires 22mb just for the basic binaries so to much for me to install locally (50mb directory limit), but I do favor the approach of having all the branches in the one local location rather than making a whole new one each time, cuts down on the appearance of clutter.

    If you are looking for public hosting for your code with a repository of your choice, you can check this wikipedia article which shows a handy list of hosts and what systems they support.

    SunRPC beginner tips

    13 06 2007

    RPC (Remote Procedure Calls) are used for making client/server programs where the client can call a function on the server without having to implement their own network code, they link in with rpcbind which is run on just about every system now days. It works by using a simple msg.x template, running it through a code generator rpcgen and linking the results with the client and server code.

    I had to do 2 assignments this year for my distributed computing class using Sun’s RPC on Solaris.

    RPC is extremely painful and there isn’t to much in the way of beginners resources. The server kept core dumping on me and the bit causing the problem was often in the RPC libs rather than my code. Normally this is caused by breaking memory allocation but its very hard to track down, gdb doesn’t really help.

    A few tips i picked up while working on Sun RPC:

  • You can stop the server for backgrounding, allowing you to use standard printf’s for debugging. Just add -DRPC_SVC_FG to your servers compile line.
  • Correct memory management is crucial, any leaks no matter how minor will cause crashes with RPC.
  • Don’t leave any pointers undefined even if you haven’t put anything into them and are using a size variable of 0 (such as emptry arrays/strings), NULL them, SunRPC will try to free() any pointers after receiving the struct back which can cause your client to crash when it recives a successful response because its trying to free the memory before putting the actual response struct data in there, since they are undefined it will try to free random memory and segfault.
  • Make sure this applies to the both the sent and result struct, the server should automatically initialize the result struct as you can’t guarantee the client has passed a valid empty struct
  • Make sure you test more than one remote function in a row, quite a few memory errors will not manifest until the server attempts to process the next request AFTER the bad function has worked. If your server is seg faulting on a function, make sure its actually not the previous function that is the problem.
  • Its much easier to use one universal message struct for all the remote functions to be passed back and forward rather than making a new struct type for each function. Possibly slightly less efficient but not by too much for simple projects.
  • Its much easier to let the server do all the work, if the information is just getting printed to the console passback a string rather than a struct with the information in it. Probably not a good practice for real life usage though, but much eaiser for learning basic RPC.
  • The Linux rpcgen seems to be fairly horrible, most of my code wouldn’t work with it, doesn’t seem to support generating stubs, maybe there is an entirely different approach for programming rpc in Linux but I couldn’t find it. Current version might be broken. Things like enums just wouldn’t work for me (which awas a problem because my 1st assignment specified the them). Some sample code I downloaded worked fine, others just wouldn’t.
  • You can make your code thread safe and handling concurrent connections with ‘rpcgen -MA’, you will still need semaphores or some other form of concurrency control.
  • Sun’s rpcgen has the ability to generate template stubs for server and client code with -a, very useful, they are called msg_server.c and msg_client.c, done actually modify them as they will be overridden next time, just copy em. Can be used in combination with the -aAM for threadsafe.
  • Code generated by rpcgen is outdated and will give warning when compiled but still works on, you can fix the stubs by adding int befoure main and such but the templates you will probably need to live with.
  • Variable sized arrays in the template file are declared with not [], you can specify a max length ie , [255] will do fixed size arrays like normal.
  • RPC has a ‘string’ variable type in the template file, this is the equivalent of c’s char* (notice its string, not c++’s String) for example: ‘string name;’, your c code will see this as a normal char*
  • A array in the RPC template file will make a struct with name.name_val and name.name_len
  • char* for strings in the template file was causing me pain, I don’t remember why but there is probally a reason for string.
  • Remember to NULL terminate your strings when they are passed around, otherwise RPC won’t know when to stop freeing
  • Functions can only accept one struct, so make sure it contains everything needed.
  • Sometimes poking into the files generated by the template can help understanding some problems, such as typos in msg.x not being caught by rpcgen but causing your source to fail compiling.
  • If possible use something other than RPC (CORBA, XMLRPC, SOAP), I haven’t used them but they can’t be worse. They might have some overhead though.
  • Linkage: – SunRPC definition. – Some basic rpc examples, the way it works is a bit different to the stubs generated by Sun’s rpcgen but its fairly easy to work out the changes needed. – Same again, includes a linked list example.

    ZFS on Linux – Freedom can be so restrictive

    9 06 2007

    UPDATE2: Back in May, there was a post on Jeff Bonwik‘s (Lead ZFS developer) blog with pictures of him and Linus having lunch, they where linked to from Jim Grisanzio’s (Another Sun employee) blog with the title of “ZFS Pics“.

    There is also some work on developing a new Linux filesystem, btrfs with many of the ZFS features. “the filesystem format has support for some advanced features that are designed to leapfrog ZFS”.

    UPDATE: There has recently been some talk on the kernel development mailing list about GPLv2, GPLv3 and Solaris, including ZFS. Linus’s post, skeptical about Sun cooperating and Sun’s CEO reply saying “if it was, we wouldn’t be so interested in seeing ZFS everywhere, including Linux, with full patent indemnity.”.

    ZFS is a great file system from Sun, currently its going to be the default for the file system in OSX Leopard when its released (apparently its read-only) and its already in the FreeBSD kernel. And of course Suns operating system Solaris.

    Grub boot loader already allows for booting from it.

    Sun claim it to be the last word in filesystems. Apparently speed wise its close to hard drive platter speed like XFS, handles software raid like LVM and is able to handle more storage capacity than anyone should ever needed like ext4, supports Compression, snapshots and encryption its being worked on.

    There is ZFS on FUSE that allows you to use it on Linux, but FUSE is slower than a real file systems (Benchmarks here) and it is much harder have the main root partition on it as it must load the programs that access the hard drive from somewhere. dpkg also requires a patch for systems using Debian apt.

    Unfortunately there are 2 problems with to getting it into the core Linux kernel.

    Licensing and Patents.

    Currently OpenSolaris is under Sun’s CDDL which is incompatible with the GPL license that the Linux kernel uses. Sun have been talking about GPLing Solaris with the GPLv3. Would this mean we could see ZFS in Linux? Unfortunately no, the Linux kernel is under the GPLv2, with Linus previous saying that he would probably stick to GPLv2 for the Linux kernel, although he did recently say he was ‘pretty pleased’ about the new draft but still skeptical. The GPLv2 says that there must be no restrictions on how the software is used, the GPLv3 says you must not use it with DRM or on hardware the deliberately prevents the modification of the software (ie Tivo). Some parts of ZFS are under the GPLv2 via grub, but only the very basic bits needed for booting so probably not enough to use on a system.

    The other problem is Sun apparently have 56 patents on the technology that goes into ZFS. If its under a compatible license with the Linux kernel, then this could still prevent wide spread adoption of it in the Linux community. Its theoretically possible that Sun is secretly being payed by MS to get their code into the kernel and sue em although it seems a bit to tinfoil hat to me. Sun apparently won’t sue anyone using their codebase, but I’m not sure how legally binding that is. It also prevents reverse engineering ZFS from scratch.

    Sun have recently been making an attempt at getting the Linux community involved with Solaris, recently recruiting an ex-Debian developer Ian Murdock who’s job it is to make Solaris more appealing to the Linux user with project Indiana (mailinglist), a binary based Solaris distribution designed to be what people expect from Linux, it is possible that we could see Sun releasing ZFS in such a way that the Linux community can make use of it as a show of good faith. But its also possible that they will keep it as bait in an attempt to sway Linux developers to their side. Sun have been fairly good with the free software community of late, releasing Java under the GPLv2 (At least the bits they could), but it might be viewed as an attempt at keeping Java in play since C# and Flash have taken a large chunk out of the area.

    We could also see few GNU/Linux distributions switch to GNU/Solaris ones if/when/how Solaris is GPL’d, we could see Ubuntu Solaris one day, it being under the newer GPLv3 license could make it the free software OS of choice (well maybe it would still be HURD because of its microkernel, but that doesn’t seem to be usable yet after almost 20 years of development), there already is Nexenta which is a GNU/Solaris distribution similar to Ubuntu. I tried the version that shipped with the OpenSolaris demonstration pack (they ship’em to you free here, like Ubuntu does here), it includes a bunch of Solaris versions on 2 dvds, the case smelled funny). It looked fairly nice for an Alpha, although it didn’t detect my networking or sound, the newer Developer Solaris on the same cd had better hardware support so Nexenta might just need a newer kernel version (They have already release an alpha7 and CP with ZFS boot support, but I haven’t tried them just yet).

    I hope to see ZFS in the Linux kernel, every time its brought up in discussions it generally goes: ZFS is cool, I want it, Theres a FUSE version, FUSE is slow i want it for real, Linux carn’t have it because of CDDL, its really the patents the are the problem.

    Hopefully someone will eventually code it, just ignoring the patent issues for peoples personal use and distros will could start to include it when Solaris gets GPL’d or Sun will make some statement about it since it seems to be the most commented issue on ZFS.,130061733,339273561,00.htm