Indrek Ots
by Indrek Ots
3 min read

Categories

  • articles

Tags

  • postgres
  • database
  • sql
  • locks

As you introduce new features to your application, you will almost certainly need to evolve your database schema. However, in Postgres, there are a couple of things you need to understand before you start issuing ALTER TABLE commands if you want to avoid bringing your application to a grinding halt.

Table Locks

Postgres provides various modes for locking a table. When it comes to the ALTER TABLE command, it acquires the strongest table lock available: AccessExclusiveLock. This lock ensures that the holder is the only transaction accessing the table in any way. No other transaction is allowed to read from or write to the table while the ALTER TABLE operation is executing.

For tables with low traffic, this usually isn’t a problem. A simple addition of a column is typically very fast and it doesn’t hinder the application’s performance. However, things get more complicated when you need to alter a table that receives a lot of concurrent transactions. To better understand the problem, let’s look at the following example scenario.

Example

The following simulates a query that an application might issue. It starts a transaction and queries for accounts. A select query acquires an AccessShareLock on the table. Notice that the transaction isn’t immediately committed. This simulates a scenario where the application might be doing some work before committing the transaction.

begin;
select * from accounts;

Next, in another database session, let’s simulate a schema migration.

alter table accounts add column age int;

When the ALTER TABLE command is issued, you’ll see that the statement never finishes. This is because the ALTER TABLE transaction tries to acquire an AccessExclusiveLock on the accounts table but is unable to do so because there’s an existing read transaction on the table. As mentioned earlier, the AccessExclusiveLock guarantees that the holder of the lock is the only transaction working on the table. This also means that when there are existing locks on the table, the transaction has to wait until all existing locks on the table are released.

In this scenario, the accounts table receives a lot of transactions. In a new database session, let’s simulate another query from the application.

select * from accounts;

Once we execute this query, we immediately notice that it doesn’t return anything. The query just hangs. It tries to acquire an AccessShareLock lock on the table but is unable to do so because of the transaction created in the previous step.

The ALTER TABLE transaction isn’t able to proceed. It joined a wait queue. The AccessShareLock required by the final select query is incompatible with AccessExclusiveLock. Therefore the final select query also has to join the wait queue.

Diagram displaying the transactions in the wait queue
Alter table statement is blocked by the ongoing transaction. A select from the accounts table is blocked by the alter table statement.

To sum up, in a busy table, it may be very difficult to acquire an AccessExclusiveLock that’s required by ALTER TABLE commands. It’s surprisingly easy to accidentally bring your application to a grinding halt. If the ALTER TABLE transaction cannot acquire the lock immediately, you’re denying access to the table for everybody.

How to Not Mess Up

An easy way to avoid accidental service disruptions is to define a lock_timeout for the database migration. This ensures that Postgres will abort the statement if it waits longer than the specified amount of time while attempting to acquire a lock on a table.

begin;
set lock_timeout = '1s'
alter table accounts add column age int;
commit;

If the ALTER TABLE statement cannot acquire a lock on the accounts table within one second, the statement is cancelled with the following error.

ERROR:  canceling statement due to lock timeout

The value of lock_timeout should be set to something small. Keep in mind, in the worst case scenario, this is how long you’re blocking all other transactions from accessing the table.

With busy tables, however, you might find that you need to perform the migration multiple times until it executes successfully. Therefore, if you can and your tooling supports it, it makes sense to retry the migration automatically several times.

You might also want to consider when to perform the schema migration. For example, your database might be processing fewer transactions during the night. Performing the migration during low traffic periods increases the likelihood of success.

Summary

Most schema migration operations in Postgres acquire exclusive locks. In databases that process a lot of transactions, acquiring the lock can take a significant amount of time. If the lock isn’t acquired quickly, it can result in blocked transactions and even application downtime. To avoid any negative outcomes, it’s strongly advised to configure a lock_timeout.