Existing by coincidence, programming deliberately
There's nothing earth-shattering here, but I can recall that each of these principles were not obvious to me at some point in time. Perhaps some of them are not obvious to you right now.
ON DELETE
semanticsON DELETE SET NULL
jsonb
columnsupdated_at
columns NOT NULL
to make sorting easiercitext
extension for email addressesON DELETE
semanticsWhen you create a foreign key constraint using REFERENCES
,
the default behaviour for ON DELETE
is NO ACTION
.
That means deletes on the foreign table
will fail if they break any referring rows.
Sometimes that's okay and what you want to happen.
But plenty of times it won't be
and in those cases it tends to bite you
when it's least convenient.
For that reason,
it's prudent to always make your ON DELETE
clauses explicit,
even if the action is NO ACTION
or RESTRICT
.
Doing that as a rule
means there will never be occasions
when you forget to declare them by accident,
leading to unexpected DELETE
failures
down the line.
You can enforce the rule with a crude grep-based lint check in your CI pipeline:
#!/bin/sh
MISSING_ON_DELETE=$(git grep -i references -- '*.sql' | grep -iv 'on delete')
if [ "$MISSING_ON_DELETE" != "" ]; then
echo "ON DELETE not declared on foreign keys:"
echo "$MISSING_ON_DELETE"
exit 1
fi
Of course,
any failure here might just be a newline
that was inserted before the ON DELETE
clause.
But it's not too much hardship to fixup your pull requests
so that ON DELETE
is always on the same line as REFERENCES
.
The payoff is that you'll never end up in situations
where deleting data accidentally fails
due to the default semantics.
ON DELETE SET NULL
Following from the previous point,
any time there's a shred of doubt
about which ON DELETE
behaviour is correct,
consider opting for ON DELETE SET NULL
.
The case for ON DELETE CASCADE
is often obvious,
if the foreign row has clear ownership of referring data.
Things become more uncertain when the decision is between
ON DELETE SET NULL
and ON DELETE RESTRICT
or ON DELETE NO ACTION
.
The fallout from erring towards ON DELETE SET NULL
is usually easier to deal with
in those cases.
For instance, consider the extreme case of a GDPR request from a user to delete all their data. You have a legal obligation to ensure everything gets deleted, even if it means leaving some nulls behind in other tables. Any left-behind nulls can be handled according to your business logic. Sometimes it's fine to leave them in place and just code round them. Other times you might want an automated process that surfaces them for human intervention, or perhaps there are other users in your system that ownership can be transferred to. Failing deletes are typically less preferable to those alternatives.
It can happen in a data model that you want an either/or relationship between two (or more) columns. When that scenario arises, you should enforce it with a constraint so it's physically impossible for clients to create conflicting data.
To give a concrete example, imagine a customisable navigation menu where each item can link to either a URL or some entity from the database. The simplified table might look like this:
CREATE TABLE menu_items (
id uuid NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
position integer NOT NULL,
entity_id uuid REFERENCES entities(id) ON DELETE SET NULL,
url text
);
Here entity_id
and url
are mutually exclusive,
and you want to guarantee
they can't both be set on a single row.
A CHECK
constraint that counts non-null values
can do that:
CREATE TABLE menu_items (
id uuid NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
position integer NOT NULL,
entity_id uuid REFERENCES entities(id) ON DELETE SET NULL,
url text,
CONSTRAINT ck_menu_items_mutex CHECK (
(entity_id IS NOT NULL)::integer +
(url IS NOT NULL)::integer <= 1
)
);
Perhaps you want to add a third type of menu item later,
linking to records from a views
table:
ALTER TABLE menu_items
ADD view_id uuid REFERENCES views(id) ON DELETE SET NULL;
If so, you should amend the constraint at the same time:
ALTER TABLE menu_items
ADD view_id uuid REFERENCES views(id) ON DELETE SET NULL,
DROP CONSTRAINT ck_menu_items_mutex,
ADD CONSTRAINT ck_menu_items_mutex CHECK (
(entity_id IS NOT NULL)::integer +
(view_id IS NOT NULL)::integer +
(url IS NOT NULL)::integer <= 1
);
In this way you've guaranteed that mutual exclusivity is upheld between the relevant columns.
jsonb
columnsWhen storing data as jsonb
,
it's common to declare the column as NOT NULL
so clients aren't forced to perform a null check
before dereferencing:
data jsonb NOT NULL DEFAULT '{}',
However there's a lurking footgun here
because the string 'null'
is valid JSON
that will evaluate to null when parsed.
You can preclude this from happening
with a constraint:
data jsonb NOT NULL DEFAULT '{}',
CONSTRAINT ck_data_not_null CHECK (
data <> 'null'
)
Now clients are freed from the burden
of null-checking data
before dereferencing it.
updated_at
columns NOT NULL
to make sorting easierIt's common practice
to add created_at
and updated_at
columns to tables
if you want to track when rows were last changed.
The definition of created_at
is clear:
created_at timestamptz NOT NULL DEFAULT now(),
It's tempting to define updated_at
as nullable
so that clients can easily determine
whether a given row has been updated since creation:
updated_at timestamptz,
However doing that has implications on sort order,
if you ever want to sort rows
in order of last-updated on the frontend.
For example,
if you SELECT
with ORDER BY updated_at DESC
,
the rows with a null updated_at
will be sorted to the beginning of the result set.
Typically that's not what your users want to see.
Adding NULLS FIRST
to your SELECT
doesn't help either,
because what users really want
is an interleaved sort order of last-updated-or-created.
For that, you can define updated_at
as NOT NULL
too:
updated_at timestamptz NOT NULL DEFAULT now(),
Now when you ORDER BY updated_at DESC
,
rows will be returned in an order that's helpful to users.
But what if you also wanted to use the updated_at
column
to determine whether or not a row has been updated?
That's best handled with a separate, generated column:
is_updated boolean GENERATED ALWAYS AS (
updated_at > created_at
) STORED,
This leans on the fact that now()
returns the time
from the start of the current transaction.
So when new rows are created,
it's guaranteed that the default values
for created_at
and updated_at
will always be the same.
Now clients can use the is_updated
column
to check whether rows have been updated
and still have a user-friendly natural sort order
for updated_at
.
citext
extension for email addressesOccasionally users have their CAPS LOCK
key pressed
without realising.
So if you need to identify them
using an email address that they typed,
you should ignore case
everywhere you make a comparison.
The traditional way to do case-insensitive string comparison in SQL
is by calling LOWER()
on both operands.
That's fine,
but it's easy to forget
and if you do forget,
it isn't always spotted right away.
Fortunately there's a Postgres extension that means you don't have to remember to do explicit case-insensitive comparisons;
CREATE EXTENSION citext;
With that in place,
you can declare your email address columns as type citext
:
email_address citext NOT NULL,
There's no need to call LOWER()
when working with this data now,
the extension will automatically ignore casing
on your behalf.