This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature: Programming

Better source control for your coding projects

By Travis Snoozy on March 10, 2008 (9:00:00 PM)

Share    Print    Comments   

The proper use of source control systems is a critical skill for programmers to have, and something that many of them have to pick up through observation, trial, and error in the workplace. For students, or people who primarily program as a hobby, the learning process can be particularly slow and painful. Here are some examples and discussion on the best practices you can use to avoid common source control pitfalls.

The basic purpose of a source control system is to allow you to work without worrying. If you break the software you're working on, decide the changes you're making aren't such a good idea, or otherwise make a mistake, source control allows you to go back to the last version you checked in. It also enables multiple people to work on the codebase at once without destroying each other's work. These two properties alone make source control critical for any software developed by more than one person.

Many source control systems, both open source and proprietary, are available for use today. Which one you use will depend on several factors, though in many situations, the decision will have already been made for you. Usually, the specific source control system in use is not terribly important; most modern systems are almost interchangable, save for the occasional niche feature. However, if you find that your source control system is hindering rather than helping your team, you may want to evaluate alternatives. You can read more about CVS, darcs, and Subversion on their respective Web pages. Subversion has a book-length manual that is particularly good; while it is Subversion-specific, it still covers many aspects of source control use that can be applied to other systems as well.

A critical mistake that many programmers make is to not check in frequently enough. It can be difficult to fight the urge to not check in -- because you know the code is broken, in an incomplete state, or for some other reason. While this is a good reason to not check code into a main development area, it is not a good reason to forgo check-ins altogether. The main problem is that you put all of your changes at risk whenever you make further alterations to your code. The more changes you make without checking in, the more you put at risk if you make a mistake.

In order to check in without worrying about the completeness of a large set of changes, you need to understand the concept of branches. A branch lets you pretend that you have a separate source control repository set up for a specific purpose. However, branches are better than separate repositories, because everyone on your team can have access to the changes that you make in your branch, and conversely, you can have direct access to the changes that others make to the main line of development. When you are finished to the point of feeling comfortable doing a "real" check-in, you can merge your branch back into the main line of development. A branch used in this fashion is usually called a personal branch (if you are the only one checking into it) or a feature branch (if more than one person is collaborating on the work).

The darcs code management system considers every checkout to be its own branch. Just check out your code and record your changes as patches at appropriate intervals.

darcs get http://example.com/trunk your-local-branch

Subversion has a straightforward syntax for merging branches, but a lot of manual bookkeeping goes into determining the correct values to pass to the -r parameter.

$ svn merge svn://example.com/repo/branches/branch svn://example.com/repo/trunk -r5:9

Branching and merging tend to be the most difficult tasks you can perform in source control systems, but also one of the most useful. If you can master merging, you can make excellent use of the power of source control.

Another common mistake that people make with source control is making check-ins that do more than one thing. Checking in frequently helps to reduce this problem, but even a check-in that changes only two lines of code can do too many things -- if those two lines of code are unrelated to each other, and fix two different problems. Each check-in should do one, and only one, logical thing. A good rule of thumb is that each bug should usually have one check-in associated with fixing it (although some wide-reaching bugs may require more than one check-in to fix). Also, most small features should have one check-in associated with them. Medium or larger features should be done in multiple check-ins, at logical points in a branch, with a single merge back into the main development area once the feature is operational.

The major reason for having a one-to-one mapping with logical changes and check-ins is to save time when bugs arise: Quality Assurance has an easier time identifying where a bug might have been introduced if each check-in does only one thing, and that one thing is clearly explained in a single sentence in the check-in log. Having each check-in to the mainline be a complete, self-contained change also makes it easier to undo these problematic check-ins if the bugs can't be fixed in a reasonable way. If a check-in had two logical changes, or (even worse) one-and-a-half logical changes, then reverting the entire check-in would undo more changes than necessary -- possibly taking out unrelated features or even bugfixes.

Darcs has a check-in procedure that lets you select exactly which changes you want to check in -- even if those changes are all in the same file. This can be handy if you frequently forget to check in an old change before starting on a new one.

$ darcs record hunk ./README 7 + +Baz. Shall I record this change? (1/?) [ynWsfqadjkc], or ? for help: y hunk ./TODO 1 +foo! Shall I record this change? (2/?) [ynWsfqadjkc], or ? for help: n hunk ./TODO 23 +bar? Shall I record this change? (3/?) [ynWsfqadjkc], or ? for help: y What is the patch name? Bazbar Do you want to add a long comment? [yn]n Finished recording patch 'Bazbar'

Having check-ins change only one thing at a time is beneficial to developers as well, because it tends to encourage them to not keep too many modified files in their copies of the source tree. In my experience, the leading cause of build breaks tends to be devs either forgetting to check something in, or checking in something that they didn't mean to. By restricting yourself to making one change at a time, all of the modified files in your source tree can be checked in without you having to worry about breaking the build.

The last common mistake is actually from a management perspective -- specifically, about how releases and the source control system relate to one another. Almost all projects have the notion of a release: some version of the software that's considered good enough to send off to users. Most projects should also have the concept of development and stable releases; that is, a development version that gets new features, and a stable version that only has bug fixes applied to it. For folks who aren't used to releasing, the obvious way to do any kind of release is to simply make a tarball when the code in the source repository looks good. However, the source control system can (and should) be used to enhance the release process.

Rather than simply cut a tarball for every release, every release should correspond to a tag in your source control system. Tags are what the name implies -- little snippets of text attached to, in this case, a specific point in time in your repository. When the code is at the point where you want to make your tarball, you tag that exact instance before you make the tarball. This allows you to regenerate the exact same release (if you, say, accidentally delete your tarball), as well as keep track of what check-ins occurred between any two releases.

Tagging is usually a simple operation to execute; even in CVS, it's easy to make a tag if your local checkout represents the release you want to make:

$ cvs tag release_1_0_0

Tagging is the easy part. The more difficult issue that your source control system can assist with is ensuring that only bug fixes are applied to your progressive stable releases. The proper approach here involves the judicious use of branches. When you're ready to make a release that you intend to stabilize (e.g., 1.0), you should make a new branch, and tag that branch right off the bat. Then, you can continue to add new features on the mainline, and apply only bug fixes to your stable branch, continuing to tag releases (1.0.1, 1.0.2, etc.) at appropriate points on that branch. Eventually, you'll want to start stabilizing the new features that you've written in the mainline, at which point you simply need to make a new branch (1.1) and repeat the stabilization process. In this manner, you can continue to support many different versions of your software as your project or business model requires.

Share    Print    Comments   

Comments

on Better source control for your coding projects

Note: Comments are owned by the poster. We are not responsible for their content.

Wow. No mention of git

Posted by: Anonymous [ip: 76.27.63.88] on March 10, 2008 10:42 PM
Git is probably the safest bet in revision control today. It's user and developer base is huge. The Windows port is pretty good now (msysgit).


If you're starting a new project, or just putting a project in revision control, if you even consider cvs or svn, you're crazy. Mercurial and git (and probably bazaar) are reliable and widely used. You'll switch to one of them one day, so why not just skip the brain-damage step?


Don't let the "distributed" aspect of git or others scare you off. Git/hg/bzr is the best even for local config files and home directory backups. It's faster and better even if you don't use the distributed features at all.

#

Where is git in this article :(

Posted by: Anonymous [ip: 137.226.103.144] on March 10, 2008 10:52 PM
This is sad. When I read the headline on the frontpage I thought it might be another good article which compares the versioning systems at hand. But since git isn't mentioned at all I can not take it seriously. Sure, saying that "most of the time the choice has already been made for you" does have some truth, but this still doesn't mean that you can't use a superior SCM/content-tracker at home or as a backend to the existing system.
From sentences like "A critical mistake that many programmers make is to not check in frequently enough." and "Branching and merging tend to be the most difficult tasks you can perform in source control systems" you can already see that the author clearly hasn't gotten around to using git yet. Also it seems to me that every article which deals with CVS/SVN nowadays just has to mention that you can, in fact, "branch" and "merge" in those systems too, albeit in a complicated way.

#

I miss git too

Posted by: Anonymous [ip: 83.78.40.200] on March 11, 2008 12:52 AM
I really don't want to sound like a git-fanboy or be like linus-wrote-git-so-it-has-to-be-the-best. And I'll also honestly admit that I have never used or looked at darcs, bazaar and most other software out there, so I can't even comment on them. Here's my experience:
(1) I learned CVS. It was a pain to use and I learned and re-learned how to do branching and merging a couple of times and really never fully understood it. As a result, I never used branching.
(2) I switched to subversion. I was very happy to see something that sucks much much less than CVS and subversion really has lots of good features. Unfortunately, it did not solve my branching/merging problem. It was just as complicated to grasp. They have a few nice ideas with what they call a 3D-filesystem, but it turned out that it's only cool; not useful.
(3) I switched to git, which is so much easier to set up and use. Branching and merging is an enjoyable experience. All operations are incredibly fast, since you hardly ever go over the network. It has tons of commands, which was confusing. But once you know which ones are important and which ones you can ignore, you can learn to use git very quickly.

git is really a very fast, very nice and most importantly, very sane and easy to understand source control tool. It's definitely worth trying.

#

Better source control for your coding projects

Posted by: Anonymous [ip: 134.134.136.3] on March 11, 2008 01:21 AM
Darcs nicely supports the idea that checkins should only have one feature in them.
Its interactive checkin steps through your changes and adds only the ones you select.
With cvs/subversion, you have to move the file out of the way. Remake the changes you wanted to checkin first.
Checkin. Then move the original file back into place.

#

Better source control for your coding projects

Posted by: TravisSnoozy on March 11, 2008 03:42 AM
For the first few posts by the Git-lovers of the crowd: I will consent that the title of the article is poor versus the information that I wanted to get across. The idea here is not to embellish specific source control systems, it's to try and point out things that you have to do regardless of the source control system that you wind up using. It doesn't matter if you use Git or (heaven forbid) Visual Source Safe — these tasks should map to all systems, either easily (in the case of good systems), or with great difficulty (in the case of bad systems). Knowing what you need to do is the larger part of the learning curve; learning how to do it with your source control system of choice is just a matter of syntax and/or contortionism. ;)

#

Git - Not as important

Posted by: Anonymous [ip: 124.84.168.3] on March 11, 2008 04:16 AM
I mean just because the Linux kernel source is controlled by Git you think it's worthy of being compared with SVN? What do you think this is, Linux.com?

#

Better source control for your coding projects

Posted by: TravisSnoozy on March 11, 2008 05:06 AM

To clear up any misunderstandings, I'd like to touch more specifically into why Git isn't mentioned.



Across a decade-plus of coding and 3 companies, I've never used Git. I've never been part of any open source project that used Git. I've never talked with Git users, or read about Git, save to know that it's what Linus wrote after BitKeeper was snapped out from under him, and it's distributed. All in all, I think that the Git users in the crowd would probably agree that I shouldn't be writing about things I know nothing about. ;)



None of this means that Git any less of a totally awesome system. It just means I've never had to use it<a name="ts-p2-rfn1" id="ts-p2-rfn1" href="#ts-p2-fn1">*</a>. I have encountered Visual SourceSafe (blech), Perforce, and some ancient MS abomination (pre-VSS; gag-me-with-a-spoon awful) in the workplace; and SVN, CVS, monotone, and darcs in the OSS world. I fully admit that darcs is probably even less common than Git, but darcs got the mention because I've actually used it<a name="ts-p2-rfn2" id="ts-p2-rfn2" href="#ts-p2-fn2">**</a>, and I didn't hate it.



Again, I want to underscore that the thrust of the article is not about which source control system is best — that's a religious issue that I could never hope to argue. The purpose is to point out how you should use your source control system, whichever system that may be, and why you should use your system in these ways. As an exercise to my readers, you can feel free to come up with alternate example sections for each of the systems I mentioned, and verify that each point is equally relevant to development under Git as it is to development under these other systems. If the activities mentioned in the article don't make sense in Git, I'd be really interested to know why. :)



<a name="ts-p2-fn1" id="ts-p2-fn1" href="#ts-p2-rfn1">*</a> I don't learn new RCSs for fun, anymore than I learn new languages for fun. I learn them when I have to use them regularly, and I have to use SVN and Perforce far more regularly (read: daily) than I have to hack on Linux or X. Again, it doesn't make Git any less good, it just makes it less common.



<a name="ts-p2-fn2" id="ts-p2-fn2" href="#ts-p2-rfn2">**</a> This was despite my moaning about it at the time. I was very pleasantly surprised by the ease of use. Unfortunately, the "stack of patches" approach and every developer having their own branch makes me nauseous from a management standpoint. For me, it's completely counter-intuitive to keeping a single, good copy on a line of development. Personal branches are very, very easy to abuse in the non-distributed world, and I don't see how that potential for abuse can be curbed in a system that actively relies on each and every checkout being an automatic personal branch. By contrast, I recall hating my encounter with monotone.

#

Better source control for your coding projects

Posted by: Anonymous [ip: 62.212.121.211] on March 11, 2008 06:32 AM
People looking for a nice and modern RCS should definitely have a look at Mercurial (hg)...

http://www.selenic.com/mercurial/




Bazaar (bzr) is worth mentioning too:

http://bazaar-vcs.org/

#

Darcs

Posted by: Anonymous [ip: 69.17.73.250] on March 11, 2008 11:47 AM

I was pleased to see darcs mentioned in an article. I've been looking forward to trying it. The appealing aspect is that it's got a solid foundation in mathematics. Which means to me that it's going to have the most power when it comes to re-merging branches or portions thereof, as is often needed when cherry picking changes to be merged back into the main line branch.



I would be interested to hear you elaborate on your qualms about personal branches from a management standpoint, particularly vis-a-vie darcs. It seems to me that if you want the developer to frequently check-in, and you also want to keep some branches in an "always working" state, that you're just going to have to put up with having gobs of branches. I see no alternative.


Regards,
Karl O. Pinc <kop@meme.com>

#

Re: Darcs

Posted by: TravisSnoozy on March 11, 2008 02:49 PM

In a nutshell, having gobs of branches may be unavoidable, but you should only ever have the exact number of branches that you need, not more. Each branch needs to be managed — monitored for what needs to be merged into and out of it — and that's a lot of overhead. Merges themselves tend to be awkward, and induce an overhead in figuring out how to perform them correctly (specifically, in the case of both logical and actual conflicts).



In other words, the more you branch, the more you isolate, and isolation is a double-edged sword. It means that your changes don't mess up other parts of the tree, but it also means that you don't realize fixes from other parts of the tree nearly as quickly. It's easy to get into a "right-hand doesn't know what the left-hand is doing" type scenario when there's too much isolation.



On the other hand, I can appreciate that every svn up on a working copy with modifications is roughly equivalent to a mini-merge in and of itself. Darcs (and, I assume, other distributed RCSs) just make that explicit. However, there seems to be an extra management step insofar as needing to make sure that the changes that get recorded locally actually make it back to the main server, and don't languish in these personal branches (the "real" checkin). Furthermore, I would assume that it may be difficult to get "clean" patches if developers don't learn good branch skills in the first place — that is, you could wind up with six interleaved patches in your local repo, which work on two features, but leave the repo in an inconsistent state unless all six are applied (e.g., the earlier patches introduce problems that the later patches fix, and the later patches also finish up the functionality). I percieve this to be less of an issue with managed feature branches, because the delta between start of the branch and end of the branch will always be the "perfect" diff, with the majority of the kinks and bugs worked out — the issue of multiple features being introduced isn't there if you manage it properly.



Hopefully this makes sense. Feel free to poke at any holes you see in my concerns. ;)

#

Re(1): Darcs

Posted by: Anonymous [ip: 69.17.73.250] on March 12, 2008 05:27 PM

I don't see any real holes in your concerns. It appears to me that, as is often the case, the problem is between keyboard and chair. The question is how technology can best address the issue. As you imply in your article, one of the goals of using revision control is to produce "perfect diffs". Ideally each bug fix or feature enhancement is a single "perfect diff". The right way to do this is to have each be it's own branch, which is a people problem. Each developer has to learn to manage multiple branches. This requires a bit of organization, but increases the likelyhood that any given branch will merge trouble-free because each branch touches less code and each branch is "smaller" and so can be completed in less time. Of course sometimes they'll be mistakes, where multiple changes are made to a single branch, and sometimes they'll be legitimate conflicts. The ability to analyze, cherry pick hunks, and otherwise resolve the conflicts is where rubber meets the road when it comes to source code management. From my armchair, without having actually used darcs or a lot of the alternatives, this is where darcs mathematical foundations should come to the fore and deliver real power into the hands of the developer. I can't say whether that power is enough to cut through the chaos of the development process. In line with your concerns, it probably depends on the developers, and therefore, indirectly, on good management personnel. I don't think that the discipline has truly developed good practice, which is partly a chicken-and-egg problem because you can't develop good practice without starting with good tools and you don't need powerful tools if you're locked into a rigid development process. Lots of branches works for me, but I've not managed a team in a while and I don't know how it would play out if I tried it with a team. I imagine the best approach would be to tailor each programmer's practice to their skill level. Some might be required to use a single branch and others left to manage their own work.

Someone knows the best way. It'd be nice to hear from that person.

Karl O. Pinc <kop@meme.com>

#

Better source control for your coding projects

Posted by: Anonymous [ip: 210.211.168.169] on March 11, 2008 12:03 PM
I think the objective of this article is to tell people why use RCS and what are the good practices of doing the same. Author has tried to share his experience through it which is appreciable. Mentioning available RCS, even in comments is worth, but to say that a specific one should have been a part of article doesn't make much sense. Even though new systems like Git and Darcs have simplified many things but there are still many users who find themselves more comfortable with CVS than any other. Eventually its a matter of ones own choice.

#

Better source control for your coding projects

Posted by: Anonymous [ip: 149.39.208.10] on March 11, 2008 04:50 PM
Thank you for the article Travis. I thought it was helpful and clearly written. I also enjoyed the discussion it generated. Its always good to hear about new systems, although (like Travis) I also "don't learn new RCSs for fun". I hope that some of the people advocating a particular RCS will write their own article about that system someday, It would be good to see either how you can use that RCS to manage code 'the right way', or how that RCS makes common tasks easier than its competition.

#

Better source control for your coding projects

Posted by: Anonymous [ip: 209.203.70.25] on March 11, 2008 09:54 PM
Thanks Travis! I've always been a solo-coder, but I'm going to give Git a try. I can see the benefits of version management =)

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya