Ditch the Dedicated Message Broker: The Case for Simple Queueing with Postgres and SKIP LOCKED

The Architectural Tax You Didn't Know You Were Paying

Stop me if you've heard this one before: A startup begins with a clean, monolithic architecture. Within six months, they have three microservices, a Redis instance for caching, a RabbitMQ cluster for background jobs, and a managed Kafka service because someone read a blog post about 'event-driven scales.' Suddenly, the team spends more time debugging connection pool exhaustion and 'missing message' edge cases than shipping features. We have been told for a decade that dedicated brokers are a prerequisite for professional backend engineering. But for 90% of us, this is nothing more than an architectural tax that slows down development and increases operational surface area.

What if I told you that the tool already sitting at the heart of your stack—PostgreSQL—is likely a better message broker than the specialized ones you're paying for? By leveraging the Postgres SKIP LOCKED task queue pattern, you can eliminate entire categories of bugs while simplifying your infrastructure to its bare essentials.

The Magic of FOR UPDATE SKIP LOCKED

Before Postgres 9.5, building a queue in a relational database was a recipe for performance nightmares. You either dealt with heavy row-level locking that killed concurrency, or you ended up with multiple workers grabbing the same job. Then came FOR UPDATE SKIP LOCKED. This simple clause allows a worker to query for the next available job and immediately 'hide' it from other workers without waiting for them to release their locks.

The SQL is deceptively simple:

BEGIN;
SELECT id FROM job_queue WHERE status = 'pending' ORDER BY created_at LIMIT 1 FOR UPDATE SKIP LOCKED;
-- Process the job in your application code --
DELETE FROM job_queue WHERE id = :id;
COMMIT;

This approach eliminates the 'thundering herd' problem. When ten workers request a job simultaneously, they don't block each other. Postgres simply gives each worker the next available row. It is elegant, robust, and requires zero extra infrastructure.

The Death of the Dual-Write Problem

One of the most persistent headaches in microservices is the 'dual-write' problem. Imagine a user signs up. You need to save their profile to your database and send a 'Welcome' email via a background worker. If you use RabbitMQ or Redis, you have two separate systems. If your database transaction commits but the network blips before you can push to the broker, your user exists but the email is never sent. If you push to the broker first and the database transaction fails, you're sending emails for users that don't exist.

With a transactional message queue in Postgres, this problem vanishes. Because the job queue is just another table in the same database, you can wrap the user creation and the job insertion in a single ACID transaction. It either all succeeds or all fails. No more inconsistent state. No more manual cleanup scripts at 3:00 AM.

Is It Fast Enough? (Spoiler: Yes)

The biggest pushback against database-backed queues is performance. Critics argue that Postgres can't compete with the throughput of a dedicated broker. While technically true at the extreme high end, the Postgres is the only Queue you need research suggests that Postgres can comfortably handle between 10,000 and 50,000 jobs per second. Unless you are operating at the scale of Uber or Netflix, you are likely nowhere near that ceiling.

In fact, many teams find that switching away from a dedicated broker actually reduces latency. A benchmark study noted that replacing RabbitMQ with Postgres reduced p95 latency from 340ms to 210ms by eliminating the network variance inherent in jumping between different specialized systems. To squeeze even more performance out of your Postgres SKIP LOCKED task queue, you can use UNLOGGED tables. As highlighted by Vrajat's analysis, using UNLOGGED tables can reduce Write-Ahead Log (WAL) volume by approximately 30x, removing the primary I/O bottleneck for high-frequency queueing.

The Operational Dividend

Think about what you get for free when your queue is in Postgres:

Unified Backups: Your jobs are backed up alongside your application data. No more worrying about whether your Redis RDB files are in sync with your SQL dumps.
Standard Tooling: You don't need a specialized dashboard to see how many jobs are failing. SELECT count(*) FROM job_queue WHERE status = 'failed' is all the monitoring you need.
Observability: You can JOIN your jobs table with your users table to see exactly which customers are experiencing background processing delays. Try doing that with a binary blob in a Redis list.
Reduced Cost: You can cancel that managed RabbitMQ instance or that oversized ElastiCache cluster. Consolidating your stack onto a single database engine saves money and reduces the number of things that can break.

Addressing the Elephant in the Room: Bloat

It’s not all sunshine and rainbows. The primary trade-off of a Postgres queue is MVCC (Multi-Version Concurrency Control) bloat. Since a queue is essentially a series of rapid inserts and deletes, the table can grow physically large even if it only contains a few rows at any given time. The solution? Aggressive autovacuum tuning. By setting a lower autovacuum_vacuum_scale_factor specifically for your queue table, you can keep the table lean and performant without impacting the rest of your database.

When Should You Actually Move to RabbitMQ?

I am not saying RabbitMQ and Kafka are useless. They are phenomenal pieces of engineering. You should consider moving away from a Postgres SKIP LOCKED task queue when you hit one of these three walls:

Massive Scale: If you are processing 100 million events per day, the WAL overhead of Postgres will eventually become your bottleneck.
Complex Routing: If you need intricate exchange logic, header-based routing, or complex fan-out patterns, RabbitMQ's built-in features are worth the price of admission.
Read Replica Limitations: Tools like LISTEN/NOTIFY—which help workers react instantly to new jobs—don't work across read replicas. If your architecture relies heavily on distributing queue load across a cluster of replicas, a dedicated broker might be necessary.

Simplicity is a Feature

Modern web development has become a race to see how many technologies we can stack on top of each other. But every new tool is a liability. It’s a new security patch to track, a new configuration file to manage, and a new failure mode to understand. By choosing a Postgres SKIP LOCKED task queue, you are choosing 'Boring Technology' that works, scales, and stays out of your way.

Before you reach for the latest distributed message broker for your next project, ask yourself: 'Could I just use a table for this?' For 90% of us, the answer is a resounding yes. Start simple. Use Postgres. Your future self—the one who isn't waking up for a Redis connection error at midnight—will thank you.

Ankit Kushwaha

Bringing you the most relevant insights on modern technology and innovative design thinking.

View all posts

Continue Reading

View All

Jun 11, 20261 min read

Indian startups are returning home. Why?

May 12, 20266 min read

Stop Mocking Your Database: How Testcontainers and the 'Real-World' Integration Pattern Kill Flaky CI

The Architectural Tax You Didn't Know You Were Paying

The Magic of FOR UPDATE SKIP LOCKED

The SQL is deceptively simple:

BEGIN;
SELECT id FROM job_queue WHERE status = 'pending' ORDER BY created_at LIMIT 1 FOR UPDATE SKIP LOCKED;
-- Process the job in your application code --
DELETE FROM job_queue WHERE id = :id;
COMMIT;

The Death of the Dual-Write Problem

Is It Fast Enough? (Spoiler: Yes)

The Operational Dividend

Think about what you get for free when your queue is in Postgres:

Unified Backups: Your jobs are backed up alongside your application data. No more worrying about whether your Redis RDB files are in sync with your SQL dumps.
Standard Tooling: You don't need a specialized dashboard to see how many jobs are failing. SELECT count(*) FROM job_queue WHERE status = 'failed' is all the monitoring you need.
Observability: You can JOIN your jobs table with your users table to see exactly which customers are experiencing background processing delays. Try doing that with a binary blob in a Redis list.
Reduced Cost: You can cancel that managed RabbitMQ instance or that oversized ElastiCache cluster. Consolidating your stack onto a single database engine saves money and reduces the number of things that can break.

Addressing the Elephant in the Room: Bloat

When Should You Actually Move to RabbitMQ?

Massive Scale: If you are processing 100 million events per day, the WAL overhead of Postgres will eventually become your bottleneck.
Complex Routing: If you need intricate exchange logic, header-based routing, or complex fan-out patterns, RabbitMQ's built-in features are worth the price of admission.
Read Replica Limitations: Tools like LISTEN/NOTIFY—which help workers react instantly to new jobs—don't work across read replicas. If your architecture relies heavily on distributing queue load across a cluster of replicas, a dedicated broker might be necessary.

Simplicity is a Feature

Ankit Kushwaha

Bringing you the most relevant insights on modern technology and innovative design thinking.

View all posts

Continue Reading

View All

Jun 11, 20261 min read

Indian startups are returning home. Why?

May 12, 20266 min read