Monday, September 29, 2008

scm & build: Levels of configuration

Here's the first little part of the (to be - or not -) series on configuration management and build management.

Although I wanted to start with some clarification on "Task level commits" I actually concentrated on different levels of configuration in a build environment. Here we go...

The levels of configuration

One of the biggest differentiators between a one-man-show and a team-effort-project are the different levels of configuration that have to be managed – and this is also a point where the quality of the whole build process can be heavily influenced.

Basically – unless the application-to-be is monolithic – there are four levels of abstraction: Machine dependent, user dependent, purpose dependent and (last but not least) project specific configurations. Each of this has to be managed separately and consciously to avoid (to much) manual intervention. Talking about indirection let me cite (once again) David Wheeler to whom the phrase “Any problem in computer science can be solved with another layer of indirection. ” is attributed. As he stated in the second part – which is often omitted – “[But] this usually leads to another problem” so let's have a look at the relative pros and cons of this fine distinction. To start of lets examine each level a bit closer.


By the way: Of course there are at least two Dimensions involved in this topic as well: run-time configuration and build time configuration. For the sake of this argument I'll postpone this discussion towards the [[build]] topic.

Purpose dependent configuration

Let's start with the purpose dependent configuration since this is a concern covered in most modern environments. The purpose I'm talking about is also known as build type or target environment or something similar to that. Typical purposes are “Test”, “Debug”, “Release” or – a bit less frequent – “Integration”. Depending on the purpose of the build there usually are a number of things that differ. For “Test” there might be some hard-wired shortcut to circumvent server-roundtrips or a “don't really send to printer”-entry or some other special behaviour that is meant to make testing easier (or even possible) without imposing side-effects on already installed systems. If you're building for “Debug” – one of the most commonly differentiated purposes – you'll certainly want to include debug information into your code, something you probably don't want to ship (although that could be disputed, but that is another story). “Release” of course is the purpose with which you build the shippable product once all test and QA-work has been done. The necessity of an “Integration” purpose arises only in projects where you need to integrate several sub-products and usually has rather project-specific configuration needs.

And of course there are some things (e.g. logging) that need to be configured differently for each of these levels. But speaking of logging we encounter another type of configuration that should not be mixed with the purpose specific configuration: the project specific configuration of components. While I'll go deeper into those in the next paragraph, the important part with respect to things like logging is to be aware of the fact that some thing have both – a project specific configuration and a purpose specific one. Trying to manage both in the same way can create real nightmares (I guess, everybody who has tried to keep Log4J configuration files useful for an extended period of time without that conceptual distinction knows what I'm talking about)

Project specific configuration

This usually is the first configuration option you come across. Almost any project nowadays uses some reusable libraries. Those of course have to be adapted to the specific needs of the project and thus the first level of configuration indirection comes into existence.

Although these configurations are applicable on many levels – from configuration information specifying a windows' layout to the much mentioned log-file configurations – at least they have a clear association. They are “just another kind of source code” and thus relatively easy to handle.

Machine dependent configuration

This one strikes as soon as there is even one more developer! The path which used to point at /usr/bin has to point to /usr/local/bin, the drive for intermediates that used to be C: has to be E: and the monitor resolution goes from 1024x768 to 1600x1050. Consequently some things have to be configured somehow – and here we definitely need a distinction between build-time and run-time.

User dependent configuration

The distinction between user dependent configuration and machine dependent configuration is a bit hard to make in a time where the correlation of people:machine moved from n:1 to 1:n. But even now – where lot's of people have more than one computer the real relationship is more like n:m since some computers are still shared. Especially build and integration machines are prone to sharing. Now, even on the same machine, the configuration might differ in paths, desired screen resolutions and mounted network shares, so there is basically the same set of configuration information as there is in the machine dependent part, but it needs to be managed in a separate space.

To summarize: We have the purpose specific configuration which is a central [[build]] topic, the project dependent configuration that correlates to source code, the machine dependent configuration that correlates to hardware configuration management, and the user specific configuration that somehow correlates to profile information. All of these should have traceable connections to identify possible configuration errors.

After I have raised all these questions of course I should also answer them – I'll do so some time in the future and will provide a follow-up link in this post...

I think that even the concept to have different levels of configuration enables people to create more stable build environments.



Volker Wurst said...

Good categorization! From my experience, I have some additional /slightly different categories, mostly due to the related management processes:

- language specific entries (i.e. have to be sent to a translator)
- business /logical configurations (e.g. country specific tax calculation or legal variants).
- technical / environment specific (paths, server names, data sources etc.)

I'm looking forward to meeting you next week at AYE!

Michael said...

Hello Volker,
thanks for the input! I'll try to address these categories as well in the future.

Looking forward to our meeting at AYE too.