Some useful, non-obvious Postgres patterns

16 October 2022

There’s nothing earth-shattering here, but I can recall that each of these principles were not obvious to me at some point in time. Perhaps some of them are not obvious to you right now.

Always define explicit ON DELETE semantics
If in doubt, use ON DELETE SET NULL
Mutually exclusive columns
Prohibit hidden nulls in jsonb columns
Declare your updated_at columns NOT NULL to make sorting easier
Use the citext extension for email addresses

Always define explicit `ON DELETE` semantics

When you create a foreign key constraint using REFERENCES, the default behaviour for ON DELETE is NO ACTION. That means deletes on the foreign table will fail if they break any referring rows. Sometimes that’s okay and what you want to happen. But plenty of times it won’t be and in those cases it tends to bite you when it’s least convenient.

For that reason, it’s prudent to always make your ON DELETE clauses explicit, even if the action is NO ACTION or RESTRICT. Doing that as a rule means there will never be occasions when you forget to declare them by accident, leading to unexpected DELETE failures down the line.

You can enforce the rule with a crude grep-based lint check in your CI pipeline:

#!/bin/sh

MISSING_ON_DELETE=$(git grep -i references -- '*.sql' | grep -iv 'on delete')

if [ "$MISSING_ON_DELETE" != "" ]; then
  echo "ON DELETE not declared on foreign keys:"
  echo "$MISSING_ON_DELETE"
  exit 1
fi

Of course, any failure here might just be a newline that was inserted before the ON DELETE clause. But it’s not too much hardship to fixup your pull requests so that ON DELETE is always on the same line as REFERENCES. The payoff is that you’ll never end up in situations where deleting data accidentally fails due to the default semantics.

If in doubt, use `ON DELETE SET NULL`

Following from the previous point, any time there’s a shred of doubt about which ON DELETE behaviour is correct, consider opting for ON DELETE SET NULL.

The case for ON DELETE CASCADE is often obvious, if the foreign row has clear ownership of referring data. Things become more uncertain when the decision is between ON DELETE SET NULL and ON DELETE RESTRICT or ON DELETE NO ACTION. The fallout from erring towards ON DELETE SET NULL is usually easier to deal with in those cases.

For instance, consider the extreme case of a GDPR request from a user to delete all their data. You have a legal obligation to ensure everything gets deleted, even if it means leaving some nulls behind in other tables. Any left-behind nulls can be handled according to your business logic. Sometimes it’s fine to leave them in place and just code round them. Other times you might want an automated process that surfaces them for human intervention, or perhaps there are other users in your system that ownership can be transferred to. Failing deletes are typically less preferable to those alternatives.

Mutually exclusive columns

It can happen in a data model that you want an either/or relationship between two (or more) columns. When that scenario arises, you should enforce it with a constraint so it’s physically impossible for clients to create conflicting data.

To give a concrete example, imagine a customisable navigation menu where each item can link to either a URL or some entity from the database. The simplified table might look like this:

CREATE TABLE menu_items (
  id uuid NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
  position integer NOT NULL,
  entity_id uuid REFERENCES entities(id) ON DELETE SET NULL,
  url text
);

Here entity_id and url are mutually exclusive, and you want to guarantee they can’t both be set on a single row. A CHECK constraint that counts non-null values can do that:

CREATE TABLE menu_items (
  id uuid NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
  position integer NOT NULL,
  entity_id uuid REFERENCES entities(id) ON DELETE SET NULL,
  url text,

  CONSTRAINT ck_menu_items_mutex CHECK (
    (entity_id IS NOT NULL)::integer +
    (url IS NOT NULL)::integer <= 1
  )
);

Perhaps you want to add a third type of menu item later, linking to records from a views table:

ALTER TABLE menu_items
ADD view_id uuid REFERENCES views(id) ON DELETE SET NULL;

If so, you should amend the constraint at the same time:

ALTER TABLE menu_items
ADD view_id uuid REFERENCES views(id) ON DELETE SET NULL,
DROP CONSTRAINT ck_menu_items_mutex,
ADD CONSTRAINT ck_menu_items_mutex CHECK (
  (entity_id IS NOT NULL)::integer +
  (view_id IS NOT NULL)::integer +
  (url IS NOT NULL)::integer <= 1
);

In this way you’ve guaranteed that mutual exclusivity is upheld between the relevant columns.

Prohibit hidden nulls in `jsonb` columns

When storing data as jsonb, it’s common to declare the column as NOT NULL so clients aren’t forced to perform a null check before dereferencing:

  data jsonb NOT NULL DEFAULT '{}',

However there’s a lurking footgun here because the string 'null' is valid JSON that will evaluate to null when parsed. You can preclude this from happening with a constraint:

  data jsonb NOT NULL DEFAULT '{}',

  CONSTRAINT ck_data_not_null CHECK (
    data <> 'null'
  )

Now clients are freed from the burden of null-checking data before dereferencing it.

Declare your `updated_at` columns `NOT NULL` to make sorting easier

It’s common practice to add created_at and updated_at columns to tables if you want to track when rows were last changed.

The definition of created_at is clear:

  created_at timestamptz NOT NULL DEFAULT now(),

It’s tempting to define updated_at as nullable so that clients can easily determine whether a given row has been updated since creation:

  updated_at timestamptz,

However doing that has implications on sort order, if you ever want to sort rows in order of last-updated on the frontend. For example, if you SELECT with ORDER BY updated_at DESC, the rows with a null updated_at will be sorted to the beginning of the result set. Typically that’s not what your users want to see. Adding NULLS FIRST to your SELECT doesn’t help either, because what users really want is an interleaved sort order of last-updated-or-created.

For that, you can define updated_at as NOT NULL too:

  updated_at timestamptz NOT NULL DEFAULT now(),

Now when you ORDER BY updated_at DESC, rows will be returned in an order that’s helpful to users.

But what if you also wanted to use the updated_at column to determine whether or not a row has been updated? That’s best handled with a separate, generated column:

  is_updated boolean GENERATED ALWAYS AS (
    updated_at > created_at
  ) STORED,

This leans on the fact that now() returns the time from the start of the current transaction. So when new rows are created, it’s guaranteed that the default values for created_at and updated_at will always be the same. Now clients can use the is_updated column to check whether rows have been updated and still have a user-friendly natural sort order for updated_at.

Use the `citext` extension for email addresses

Occasionally users have their CAPS LOCK key pressed without realising. So if you need to identify them using an email address that they typed, you should ignore case everywhere you make a comparison.

The traditional way to do case-insensitive string comparison in SQL is by calling LOWER() on both operands. That’s fine, but it’s easy to forget and if you do forget, it isn’t always spotted right away.

Fortunately there’s a Postgres extension that means you don’t have to remember to do explicit case-insensitive comparisons;

CREATE EXTENSION citext;

With that in place, you can declare your email address columns as type citext:

  email_address citext NOT NULL,

There’s no need to call LOWER() when working with this data now, the extension will automatically ignore casing on your behalf.