This article is very long, it covers some basics of what revision control system (RCS)/ source code management systems (SCMS) are, basic tutorial of using subversion for a personal repository, what distributed ones are, basics of using git, bzr and hg for a personal repository and my comparisons on them. Its only a basic introduction, I’ve never had to manage any large complex projects so advanced stuff isn’t covered (plus its long enough).
If you program and don’t use some kind of rcs you are making your life much harder than it should be, rcs are a great, distributed ones are greatest. All you need is to learn a few steps to setup a repo, and somewhere to put it, anything with ssh can be used or just on the local disk.
Even for non-programmers, if you find yourself making changes to config files much then having a repository containing them is definitely a good idea, if you botch it up, you can always revert to the previous edit and compare the 2 with diff.
Originally when I would code, I would intermittently ‘cp -rf directory directory.backup’, that way if I screwed up my code I could always go back. This was working fine for my smaller projects, at least until one particularly painful Uni assignment (SunRPC will segfault on anything), eventually I had reached backup.22 and often I had to go back a few revisions, Not an easy task because I wouldn’t remember the exact number, and I had often done more than one change to the code like add comments to everything, which resulted in me creating more backups with things like a single comment added, because the code I had done had started to randomly segfault. I’m sure there was a simple memory leak but with the deadline a few hours away I didn’t have time to hunt it down (basic gdb wasn’t working because it was the SunRPC libs that where crashing). In the end I got my assignment in (although it was probably the worst mark on an assignment yet, once again I hate RPC).
After that I decided to try using a revision control system, previously I had never actually though about using them for my simple coding and just assumed they where only needed for larger project, the only time I had encountered them was to ocassionaly grab some code from when I needed something newer than was shipping with my Linux distribution, however while googling for stuff about uni I managed to find this website from another student about setting up SVN for projects. I had only previously used svn for grabbing code from public projects, I had also used cvs although it was fairly clear than cvs was a fairly outdated system.
SVN – Subversion tutoiral
SVN works rather well for me as it is on the systems at uni and can be tunneled over ssh so I can push/pull to/from my server at home. The basic functionality of svn allows for going back to any previous revision with ‘-r #’, coding from any system that can connect to the repository all I need to do is checkout/update it, seeing a ‘diff’ between revisions to see what I changed.
Unfortunately subversion isn’t distributed (explained later) so I wouldn’t recommend it, but understanding the basics of revision control is important, so I have instructions on using it here, the same basic outline of commands is used for most of the revision control systems around with a few minor differences. I might use svn for basic repository for editing config files but any of the distributed ones would work just as well.
You can use any system you either have direct access to or ssh (and http etc…) to store your code, I’m using ssh in this.
SourceForge (from the owners of Slashdot) provide free public svn (and cvs) hosting for open source projects, include bug tracking and basic forums however I haven’t found the site very nice to navigate, although you can just host a normal website on it.
Setting up a personal local repository is easy.
Firstly we need to make a svn folder where all the other svn projects will live:
Then we need to make a repository for the project:
svnadmin create ~/svn/PROJECTNAME
Next is importing the current code, you do this from the directory where your code lives not the svn created one, make sure you clean up any unneeded files like binaries and generated output first:
svn import . \
file:///home/USERNAME/svn/projectname/trunk -m "Initial commit."
Notice that it is going into the sub folder trunk, this is important because later on you might need to tag code so you might end up with /trunk/, /1.0rc1/ and /1.0/, you can just put the code in the main directory if you don’t want this kind of functionality. Make sure there are 3 /’s in the uri, normally the server name goes after the first / but since this is local there aren’t any. You must also specify the full path to your folder. -m is the commit message that describes the changes for revisions.
You can also use svn+ssh://USERNAME@SERVER/home/USERNAME/svn/PROJECTNAME/trunk if you want to do it over ssh.
The next set is to checkout your repository, even though you have a local copy you still need the subversion metadata (Annoying url prefix, I wish it was just ssh://):
svn checkout \
This time I’m doing it over ssh, once again remember that its coming from the trunk folder. The trailing PROJECTNAME is to make svn rename. co is a shorter alias of checkout if your excessively lazy.
Thats the hard bits done, from now on its very simple as all the information about where to upload is stored in the .svn folder in your project.
Now you just edit your code, and once your happy with the changes you type:
svn commit -m "Description of changes."
When you create a new file that you want to add to the repository you must first tell svn that you want to add it manually, this avoids accidentally uploading compiled binary files or files outputted by your program:
svn add filename
To update to the version of the code in the repository (or a particular version with -r#):
To see the difference between revisions, you can also specify a particular revision with -r:
To see the logs:
To make a tag:
svn copy \
svn+ssh://USERNAME@SERVER/home/USERNAME/svn/PROJECTNAME/trunk \ svn+ssh://USERNAME@SERVER/home/USERNAME/svn/PROJECTNAME/1.0
Git – Made by Linus for maintaining the Linux kernel, also used by KDE.
Mercurial - Run with the command ‘hg’, A popular Python based one, used by Sun (for hosting of Java).
Bazaar-NG – Or ‘bzr’ Python again, as used on launchpad.net and the Ubuntu community (there all Canonical made).
Distributed Revision Control Systems
SVN was a massive improvement to managing even simple personal code, I used it for several months without issues, however there is a new bread of RCS that are appearing, distributed ones. There are currently 3 main contenders:
There is also darcs (written in Haskell), GNU Arch, and monotone. There is a Wikipedia article listing various revision control software (commerical/free and central/distributed).
Being distributed means that when you check out a central repository, you actually have your own local repository rather than just a copy of the code from it, so you can commit changes without having access to the central repository. Allows for much easier experimentation as you can quickly branch off from your local repository and its Useful for people with laptops who might not have an internet connection. With subversion, you can checkout a repository but then your stuck with the one version, you can only commit back to the main repository, the most you could do is try to copy the directory and other painful workarounds. Also there isn’t technically a ‘central’ repository, although there will generally an official one everyone downloads from. Still handy features to have even when its just for personal use, for instance a simple ‘svn log’ needs to talk to the central server, which can take some time if its a large repo and/or is over a slow connection.
Speed wise Git is currently the fastest for most operations as it was designed for maintaining the massive Linux kernel. Next fastest is Mercurial and then Bazaar (which is planning to match git speeds in its 1.0 release). However for most simple projects speed isn’t that much of a requirement, as long as its not tediously slow for simple changes any of them should work fine.
The functionality of all of these are fairly similar, you tell it who you are, you init the original source directory, commit the initial repository, then you can checkout from anywhere with access, branch off code, modify code, commit it, merge it back into the master branch, push it to the server. Review Logs, see changes with diffs etc…
Most of these support the ability of checking out code from repositories of a different type, you might need a plugin though. You can also convert between systems with tailor, although you might loose some information.
In the end its probably just personal choice which one you prefer as they all offer the same basic functionality.
Firstly there is a great talk from Linus about Git on Google video, its 1hr 10min long. It might be somewhat dated however, some of the functionality talked about might have been implemented or speedup since then (for instance pushing in git now exists).
Git is written in c and is currently the fastest. It is probably best suited for larger projects. However some of Git is more advanced features are a bit harder to use and understand although not by too much for basic usage, so it might not be suited for the less experienced user. The speed improvements on Git are apparently lost on Windows systems as they rely on some specific methods of disk access (unless this has been fixed in newer versions). So Windows or multisystem developers might want to avoid it.
If you want, you can get free public Git hosting here, although its only a very basic service currently.
UPDATE: There is also github which has a free opensource developer plan (100mb, no private repos).
A nice thing about Git is that it keeps all your branches in the same folder, with bzr/hg when you branch of it creates a separate folder for that branch, you could keep them all in one main project folder (For bazaar you can create a repository that stores all your branches saving space by sharing common files) but with Git everything is in the one folder by default making for a much tidier feel, branches you aren’t working on are tucked away and you switch between them fairly painlessly with the checkout command. Might require a bit more effort to work between 2 branches however.
Git also has nice sha-1 ids for everything so you can tell if things become corrupt, and it generally views all your code as one thing rather than each file so it can track changes to a function even if its moved from one file to another.
You can ‘apt-get install git-core’ on Ubuntu/Debian, however its out of date so the instructions will vary. You can get the code from the site compile from source for a newer version.
Firstly tell Git who you are (and enable prwdy colours), the following for newer version of Git:
git config --global user.name "YOURNAME"
git config --global user.email EMAIL@DOMAIN.com
git config --global color.diff auto
git config --global color.status auto
git config --global color.branch auto
Note that those are all –config, not -confg, wordpress screws it up
To initialize the current code directory (older versions use ‘git-init-db’):
When committing to Git, you need to maintain an index of files that are to be committed, you can use the ‘add’ command to do this, in svn you only need to add new files to but in git you need to also add changed files, however rather than adding changed files manually you can use ‘commit -a’ which will automatically add the changed files to the index (but not newly created ones). Since all your files are new in this initial import you need to add them:
git add .
Then commit them:
When you want to grab your code from a remote repository and put in in the current directory, use:
git clone ssh://SERVER/home/USERNAME/git/PROJECTNAME
Enter your directory, you can then make a branch for hacking on:
git branch BRANCHNAME
View your list of branches:
Then you switch to that branch:
git checkout BRANCHNAME
Modify some code and check it into your local BRANCHNAME branch:
git commit -a
Switch back to your original local branch:
git checkout master
Merge the changes into the master branch:
git merge BRANCHNAME
Delete the extra branch (-D will force it to delete if you didn’t merge it):
git branch -d BRANCHNAME
Push the branch to your server:
git push ssh://USERNAME@SERVER/home/USERNAME/git/PROJECTNAME
Theres some more tutorial information on Git here.
Bazaar written in python is probably the slowest of the 3, however the current project roadmap for 1.0 is to match the speed of git, so there might be some improvements appearing. There are benchmarks here showing much better speed improvements, up to 0.15, no 0.16/0.17 which also list more performance improvements in their changelogs. I haven’t found any videos on Bazaar but there have been three, shuttleworth, posts recently on bazaar as a lossless RCS.
For public Bazaar hosting there is launchpad, which has bug tracking and such for project, and storing personal user branches.
Bazaar seems fairly simple to use, I haven’t needed any of the more advanced features but it seems like advanced stuff would be simpler under Bazaar than Git, but for the simple stuff there isn’t any major difference.
Firstly set your name:
bzr whoami "Your Name <EMAIL@DOMAIN.com>"
Enter your source code directory and initialize it:
Add the files to the index:
bzr add .
Commit the branch. this same command is also used to commit code after its modified, by default it will add all changed files to the index, like -a in git:
You can create a repository to store branches, this allows you to save space by sharing the common files between them.
bzr init-repo REPONAME
Now you can branch off from your remote branch into the local repository, notice its sftp for ssh now, a different standard for the same thing again, you can use ~ for the home folder now though, there is also bzr+ssh:// which doesn’t seem to need the paramiko library but i’m not sure of the difference between them other than that:
bzr clone sftp://USERNAME@SERVER/~/bzr/PROJECTNAME
In addition to ‘clone’, you can also use ‘checkout’, this means that any changes you commit, as well as being committed to the local branch will also be committed to the branch you checkout from, if possible. This is somewhat similar to svn, except changes are still committed to the local branch regardless of the remote branch being accessible (unless you use –lightweight, in which case it works just like svn and all everything depends on the remote branch working). You can also use checkout inside a branch to obtain the latest committed version of that branch into the working directory which is sometimes needed if you push branches as it will transfer the .bzr directory with the revisions but not the working branch.
You can fork of from your local branch for experimental coding, which will make a separate folder in the repository:
bzr clone PROJECTNAME PROJECTNAME-testcode
Then after coding, change to the main local branch directory and merge:
bzr merge ../PROJECTNAME-testcode
Then you can push the local branch back to your servers branch:
bzr push sftp://USERNAME@SERVER/~/bzr/PROJECTNAME
Also see the official Bazaar tutorial.
Mercurial works basically the same as bazaar. Theres a google video tech talk on it here (50min).
Thus you must firstly identify thyself:
echo -e "[ui]\nusername = YOUR NAME <EMAIL@DOMAIN.com>" > ~/.hgrc
Changeth to thy source code directory and initilizeth with:
Addeth ye new files to thy index:
Commiteth thy files to thy repo:
Snag your remote repo to a local location:
hg clone ssh://USERNAME@SERVER/~/hg/PROJECTNAME
Branch off your local main to a secondary branch:
hg clone PROJECTNAME PROJECTNAME-testcode
Modify some code, and commit to the secondary branch with:
Change back to your primary local branch and merge (this needs 2 commands):
hg pull ../PROJECTNAME-testcode
Push it to your remote repo:
hg push ssh://USERNAME@SERVER/~/hg/PROJECTNAME
Official Mercurial Tutorial.
After trying out all 3, I found them to be vary similar to each other and any would be suitable for most purposes, you could probally pick one at random and be happy or choose one based on the public services that are avilable such as launchpad, I will probably end up using bzr, hg seemed to make merging a bit more of a pain, requiring an extra step, and the ‘merge’ command some how changed from the docs to the ‘update’ command, also the aesthetics of the output wasn’t as good but thats a bit nitpicky. Bazaars rapidly improving speed should see it ahead of hg if they meet their goals. I also liked git quite alot and might use that for some stuff but it isn’t available on the Solaris systems at uni, and requires 22mb just for the basic binaries so to much for me to install locally (50mb directory limit), but I do favor the approach of having all the branches in the one local location rather than making a whole new one each time, cuts down on the appearance of clutter.
If you are looking for public hosting for your code with a repository of your choice, you can check this wikipedia article which shows a handy list of hosts and what systems they support.