The Complexity Tax We All Pay
Stop me if you've heard this one: a startup launches a simple CRUD app. By week three, they've added Redis for caching. By month two, they've introduced RabbitMQ because they need to process background emails and image resizing. Suddenly, a team of three developers is managing three different infrastructure components, three sets of monitoring alerts, and a nightmare of distributed consistency issues. We've been conditioned to believe that 'real' background processing requires a dedicated broker. But for 95% of applications, adding that extra layer is a mistake. You don't need a message broker; you need to use the database you already have more effectively.
The secret weapon is the Postgres Skip Locked message queue pattern. By leveraging modern SQL features, we can build a system that is not only simpler to manage but also more reliable than many distributed alternatives.
The 'Thundering Herd' and the Genius of FOR UPDATE SKIP LOCKED
In the early days of SQL-based queues, the strategy was crude. A worker would select a row, try to update a 'locked_at' column, and hope no other worker grabbed it at the same microsecond. This led to massive contention, deadlocks, and the dreaded 'thundering herd' where dozens of workers fought over the same job. Many developers walked away from SQL queues because of this experience, assuming the database simply couldn't handle the concurrency.
Everything changed with the introduction of FOR UPDATE SKIP LOCKED. When a worker queries the database with this clause, Postgres locks the selected rows and, crucially, tells any other competing worker to simply skip over those rows and find the next available ones. It transforms a bottleneck into a high-speed bypass lane. As noted by Inferable, this single feature solves the race conditions of naive SQL queues, allowing for truly parallel processing without the overhead of complex locking logic in your application code.
The Anatomy of a High-Performance Claim
A typical implementation looks like a Common Table Expression (CTE). It’s elegant and atomic:
WITH claimed_jobs AS (SELECT id FROM jobs_table WHERE status = 'pending' ORDER BY priority DESC, created_at ASC LIMIT 10 FOR UPDATE SKIP LOCKED) UPDATE jobs_table SET status = 'processing', locked_at = NOW() FROM claimed_jobs WHERE jobs_table.id = claimed_jobs.id RETURNING *;
This single query selects, locks, and updates the status of a batch of jobs in one round trip. It is the definition of efficiency.
Why Postgres Beats Dedicated Brokers for Mid-Scale Apps
1. Transactional Atomicity (The End of Dual-Writes)
In a traditional setup with RabbitMQ, you face a classic distributed systems problem: you update your database (e.g., creating a user) and then you must send a message to the broker (e.g., 'send welcome email'). If the database commit succeeds but the message broker is down, the email is never sent. If you reverse the order, you might send an email for a user that failed to save. This is the 'dual-write' problem.
With a Postgres Skip Locked message queue, the job and the application data are in the same database. You can wrap the user creation and the job enqueuing in a single ACID transaction. It either all succeeds or all fails. No more 'zombie' jobs or missed messages.
2. Reduced Operational Overhead
Every new piece of infrastructure is a liability. It needs to be patched, backed up, monitored, and scaled. By sticking to Postgres, you use your existing backup strategy (like WAL-G or PGBackRest) to protect your queue data. You use your existing monitoring (like Prometheus pg_exporter) to watch queue depth. You don't need to learn the nuances of RabbitMQ exchange types or Redis persistence RDB vs. AOF trade-offs.
3. Unprecedented Observability
When something goes wrong in a dedicated broker, you often need specialized tools to inspect the 'dead letter exchange.' In Postgres, your queue is just a table. Want to see how many jobs failed in the last hour? SELECT count(*) FROM jobs WHERE status = 'failed' AND failed_at > NOW() - INTERVAL '1 hour';. Want to join a stuck job against the 'users' table to see if a specific account is causing issues? It’s a simple JOIN away. This level of debugging speed is a superpower for backend engineers.
The Performance Reality Check
Critics often claim that Postgres can't scale as a queue. Let's look at the numbers. While Postgres isn't designed to handle the millions of events per second that Kafka might, its ceiling is much higher than people realize. Benchmarks for the Graphile Worker library show throughput of over 180,000 jobs per second in optimized environments. Even on modest hardware, a well-tuned Postgres instance can easily handle 50,000 jobs per second.
As highlighted in Vrajat's analysis, we can further optimize this by using UNLOGGED tables. These tables bypass the Write-Ahead Log (WAL), which can reduce I/O pressure by up to 30x. While this means the queue data wouldn't survive a hard crash, for many ephemeral background tasks, the performance boost is a worthy trade-off.
Addressing the Nuances: Bloat and Connections
It wouldn't be a senior-level deep dive if we didn't discuss the pitfalls. High-churn tables—where rows are constantly inserted, updated, and deleted—are the primary cause of table bloat in Postgres. If your autovacuum settings are too conservative, the 'dead tuples' left behind by finished jobs will eventually slow down your queries.
To run a Postgres Skip Locked message queue at scale, you should:
- Aggressively tune autovacuum for your jobs table.
- Use a connection pooler like PgBouncer. Since every worker needs a connection, a pooler prevents the overhead of managing thousands of direct database connections.
- Consider
LISTEN/NOTIFY. While the queue relies on workers checking the table, you can avoid constant 'polling' by having the database notify workers the moment a new row is inserted.
The Industry Shift: Solid Queue and Beyond
The 'Boring Technology' movement is gaining massive momentum. This is best exemplified by Ruby on Rails 8, which recently shifted its default background processing to Solid Queue—a Postgres-backed system. The Rails team recognized that the marginal performance gains of Redis weren't worth the architectural complexity for the vast majority of web applications. This is a clear signal that the industry is moving back toward consolidated, robust database patterns over fragmented, specialized micro-services.
Conclusion
The Postgres Skip Locked message queue strategy isn't just a workaround; for many applications, it is the superior architectural choice. It eliminates the dual-write problem, slashes operational complexity, and provides a level of observability that dedicated brokers can't match. Unless you are operating at the extreme scale of companies like LinkedIn or Uber—processing hundreds of thousands of events every single second—you likely don't need the overhead of RabbitMQ or Kafka.
Next time you're tempted to helm install a new message broker, ask yourself: could a simple Postgres table and a SKIP LOCKED clause do the job better? Chances are, the answer is a resounding yes. Keep your stack boring, and your production environment will be much more excitingly stable.


