adrift in the sea of experience

Wednesday, May 21, 2008

Lessons from IT projects: allowing NULL foreign keys in a database can be tricky

My previous "lesson learned" was a bit general, and perhaps not very useful. Migrating old data is hard, but if the problem wasn't hard to fix we wouldn't be payed to fix it now would we? Today I'll focus on a much more simple technical dilemma: whether to allow database foreign keys to be NULL.

Consider a database table of contacts, a classic example. Each contact has a name, telephone number, email, country_id, etc. The country_id field for the contact is actually a foreign key pointing to a country record in the country table.

Consider now that often we only have a few pieces of contact information, e.g. only a name and email address. If we don't know the country, we can't set it. It may seem natural to declare the country_id column to allow for NULL values.

It's a possibility, but it has consquences. What kind of consequences?

Suppose we'd like to pull all contact data from the database and show it on the screen or in a report. We also want the country data in the same view. The natural query for this is


SELECT * FROM Contact, Country WHERE Contact.country_id = Country.id


Clean and simple. And unfortunately, broken. Your report will silently omit all contacts where country_id is NULL! Needless to say, this can be a serious bug if you are generating a billing/bookkeeping report.

To fix this, you'd have to use a "LEFT JOIN" operation:


SELECT * FROM Contact LEFT JOIN Country ON Contact.country_id = Country.id


Or you could just forbid country_id from ever being NULL like this:

CREATE TABLE [dbo].[Contact](
[id] [int] IDENTITY(1,1) NOT NULL,
[name] [nvarchar](999),
[address] [nvarchar](999),
[CNT_CountryID] [int] NOT NULL,
... more create table goodness ...

Then of course, you also need to put a special "UNKNOWN" country in your Country table. This solution has the "magic value smell" a bit, but after debugging a few problems caused by NULL foreign keys, I very much prefer to use a special unknown country over NULL. It also has the advantage that your code will need to deal less with NULL values, which can be annoying in itself. Come to think of it, those name and address fields should also be NOT NULL, just use empty strings when you don't know the values!

No comments: