adrift in the sea of experience

Wednesday, May 21, 2008

Lessons from IT projects: allowing NULL foreign keys in a database can be tricky

My previous "lesson learned" was a bit general, and perhaps not very useful. Migrating old data is hard, but if the problem wasn't hard to fix we wouldn't be payed to fix it now would we? Today I'll focus on a much more simple technical dilemma: whether to allow database foreign keys to be NULL.

Consider a database table of contacts, a classic example. Each contact has a name, telephone number, email, country_id, etc. The country_id field for the contact is actually a foreign key pointing to a country record in the country table.

Consider now that often we only have a few pieces of contact information, e.g. only a name and email address. If we don't know the country, we can't set it. It may seem natural to declare the country_id column to allow for NULL values.

It's a possibility, but it has consquences. What kind of consequences?

Suppose we'd like to pull all contact data from the database and show it on the screen or in a report. We also want the country data in the same view. The natural query for this is


SELECT * FROM Contact, Country WHERE Contact.country_id = Country.id


Clean and simple. And unfortunately, broken. Your report will silently omit all contacts where country_id is NULL! Needless to say, this can be a serious bug if you are generating a billing/bookkeeping report.

To fix this, you'd have to use a "LEFT JOIN" operation:


SELECT * FROM Contact LEFT JOIN Country ON Contact.country_id = Country.id


Or you could just forbid country_id from ever being NULL like this:

CREATE TABLE [dbo].[Contact](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [nvarchar](999),
[address] [nvarchar](999),
[CNT_CountryID] [int] NOT NULL,
... more create table goodness ...

Then of course, you also need to put a special "UNKNOWN" country in your Country table. This solution has the "magic value smell" a bit, but after debugging a few problems caused by NULL foreign keys, I very much prefer to use a special unknown country over NULL. It also has the advantage that your code will need to deal less with NULL values, which can be annoying in itself. Come to think of it, those name and address fields should also be NOT NULL, just use empty strings when you don't know the values!

Tuesday, May 20, 2008

Lessons from IT projects: migrating existing data to a new system

I'm currently working on a redesign/reimplementation/integration of an administrative system that currently exists as a mix of 3 databases (with semantically vague and manually maintained links between them), two applications and lord knows how many excel spreadsheets. The goal is to have 1 database and 1 application, less room for mistakes, and drastically reduced administrative overhead.

As I approach the deadline for this project, I often find myself thinking "If only I had known that ..." or "I should remember X next time".

A first observation: Migrating old data to a new system is harder than you think. It forces you to follow stupid constraints for the new system, because otherwise the old data can't be expressed in the new system. Imagine trying to build a house in the middle of the forest without being allowed to cut down certain trees. You'll end up with alot of extra walls to build around the forbidden trees, and your shiny new house will start to look like a maze.

If at all possible, go for a green-field scenario where the data can be left behind or is migrated gradually/manually by the customer. After all, it's the customer who knows about the old data and business rules. If you absolutely need to migrate the existing data, don't underestimate the effort of having to map out the existing organically evolved "solution". That may be a project in it's own right.