Wednesday, November 20, 2013

Database design - Best practices

Over the years I have designed a number of databases and from that experience the following best practices have evolved.

All tables should have a primary key
Some tables might not need a primary key but even so I would recommend including one for consistency reasons and since some ORMs get really confused if tables lack primary keys.

Choose surrogate primary keys over natural primary keys
A surrogate key is a key that has no business meaning. A natural key has meaning. Social security numbers would be a typical example of a natural key. Surrogate keys never change, natural keys sometimes do change and are thus not very good candidates for a primary key. You can find endless discussions on the net about pros and cons using surrogate or natural keys. In my experience surrogate keys are the better choice in almost all situations.

Use an auto incremented identity column for all primary keys
A rather common, but in my opinion bad, pattern is to use a composite primary key in order to ensure that some combination of column values is unique, sometimes it even includes one or more foreign keys. It might also be tempting to use a composite primary key simply because the current domain logic dictates that this particular combination of columns uniquely identifies a row in the table. Even so, do not do this. If you need to make sure that some combination of column values are unique, use a unique index instead. Having a single integer as a primary key in all your tables makes a lot of things easier, especially if you are using some kind of ORM in your applications. Most database servers default to cluster tables on the primary key. This is, of course, rather useless when having identity columns as primary keys. So, instead, if you need to speed up selects, create a clustered index that matches your most frequent selects.

Name your tables in singular. 
A table containing trades should be named Trade and not Trades.

Tables and columns should have meaningful names
Be careful with abbreviations. Try to name your tables and columns so that the names actually mean something. A name should reflect purpose and not simply describe what something happens to be. For example, a column containing the name of the user who owns an order should be named owner and not user. The need for this becomes obvious as soon as you have more than one column representing, for example, users in different roles.

Foreign key names should follow a pattern
The name of a foreign key should start with the role of the key and end with Id. So, if the Position table has a foreign key relationship with the Account table there should be a foreign key in Position named AccountId pointing to the Id column in Account. Most often these foreign keys will be named <table they are pointing to>Id but not always. Note that the names should reflect the relationships meaning and in the case of positions and accounts AccountId captures that very well. However, consider a table named Project that has several foreign keys to the table User. They cannot all be named UserId. So, instead they should be named according to what the relationship represents. For example, OwnerId, CreatedById, EditedById etc.

Use CamelCase notation for tables and columns. 
A table holding email addresses should be named EmailAddress and not EMAIL_ADRESS. Even if this is mostly a matter of taste using the former integrates much better with ORM:s like the Entity Framework. A foreign key relation between the table User and EmailAddress would result in a navigation property in the User entity called EmailAddresses instead of EMAIL_ADDRESSes. Not that it matters all that much but I find it better to have property names that resemble the written language as much as possible.

1 comment:

Neil McGuigan said...

For better database compatibility, you should use lower snake_case . It's PostgreSQL's fault, as it lowercases identifiers.