Friday, January 29, 2010

ID design

We often take the IDs that we use in databases and applications for granted. As long as we can identify a data record with a unique number, everything is great. Recently, I ran into an interesting problem in an existing application. The ID that was used has this format:
ABBBCCC

A = last digit of year
BBB = day number of the year
CCC = sequence number
For example, you could get on 2 february 2009 this ID: 9033012. This would be the twelveth data record of the day.

More digits

Sometimes, the application would generate more than 1000 data records on a day and run out of numbers. In that case the application just add digits to get something like this:
ABBBCCCDDD

A = last digit of year
BBB = day number of the year
CCCDDD = sequence number with additional digits
Example: 9003123456

Perhaps you can sense some trouble here...

Decade trouble

Obviously, you will run into problems after 10 years of using this application. It makes you wonder if the designers of this application have learned anything from the Y2K problems, is storage space really this precious that you cannot spare another 3 bytes?

Length confusion

The fact that the system adds another 3 digits is pretty confusing. Technically, the ID is still unique and valid. Normally we add digits in the front of the number, not in the back of the number. This makes it impossible to sort these IDs in a meaningful way. For example, an ID created on january 1 (9001123456) could have a higher value than an ID created on january 2 (9002001).

Number interpretation

After the year 2010 started, the IDs suddenly started with a "0", like 0003123456 and in some subsystems, this ID is interpreted as a number, which would make it 3123456. Suddenly this ID can collide with an already existing ID from the year 2003.

ID should have no content

When the engineers were trying to solve this problem, they wanted to change the ID generator to just use an Oracle sequence and do away with the year/day nonsense. Because the system has grown rather large and complex over the year, changing the ID generation may cause more problems, because no one is sure that the contents of the ID is not interpreted in same way. It could be that a part of the system is making use of the fact that the ID contains the year and day number.

Conclusion

As you can see, designing an ID can be tricky. Lessons learned:
  • if you use a year, please use the full year, do not try to save a little bit of storage, it might cost a lot of money later on
  • avoid meaningful content in the ID to prevent "clever" people from exploiting this, otherwise you can never change the way an ID is generated
  • allocate enough digits in the ID, the application may live much longer than you think
  • keep it simple, use a sequence generator

No comments: