Skip to main content

Command Palette

Search for a command to run...

PostgreSQL Autovacuum Tuning: A Practical Guide

Published
6 min read

PostgreSQL Autovacuum Tuning: A Practical Guide

Here's a question that separates experienced PostgreSQL operators from everyone else: how many dead tuples does your busiest table accumulate before autovacuum kicks in? If you're running defaults on a table with 10 million rows, the answer is 2 million. Two million dead rows sitting in your heap, bloating the table, slowing sequential scans, wasting buffer cache -- all because PostgreSQL's autovacuum is waiting for the 20% threshold to trip. The fix takes one ALTER TABLE statement, but most teams never apply it because they don't know to look.

Why the Defaults Fall Short

Autovacuum is PostgreSQL's background process for reclaiming dead tuples, updating planner statistics, and preventing transaction ID wraparound. For small tables, the defaults are perfectly reasonable. The problem is scale.

The three defaults that matter most:

  • autovacuum_vacuum_scale_factor = 0.2 -- vacuum triggers when 20% of rows are dead
  • autovacuum_vacuum_threshold = 50 -- minimum 50 dead rows before vacuum is even considered
  • autovacuum_max_workers = 3 -- only 3 workers available to process all tables

The scale factor is a percentage of n_live_tup. For a 1,000-row lookup table, 20% means 200 dead tuples -- barely noticeable. For a 10-million-row transaction table, 20% means 2 million dead tuples before vacuum runs. At that point, bloat is already significant, queries are slower, and the vacuum itself takes much longer and generates more I/O than a series of smaller, earlier vacuums would have.

The second failure mode is worker starvation. Three workers sounds adequate until you consider a production database with 50+ tables receiving updates. Workers process one table at a time. If three large tables are being vacuumed simultaneously, every other table waits, accumulating dead tuples the entire time.

A common anti-pattern makes things worse: disabling autovacuum on "problematic" tables because vacuum appears to use too much I/O. This is like unplugging a smoke detector because it's noisy. The dead tuples don't disappear -- they accumulate faster, and the eventual manual vacuum (or emergency wraparound vacuum) is far more disruptive.

Detecting the Problem

Dead Tuple Accumulation

-- Check autovacuum global settings
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name LIKE 'autovacuum%'
ORDER BY name;

-- Find tables where dead tuples are accumulating
SELECT
    schemaname,
    relname AS table_name,
    n_dead_tup AS dead_tuples,
    n_live_tup AS live_tuples,
    last_autovacuum,
    last_autoanalyze,
    CASE WHEN n_live_tup > 0
        THEN round(100.0 * n_dead_tup / n_live_tup, 1)
        ELSE 0
    END AS dead_tuple_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 500
ORDER BY n_dead_tup DESC
LIMIT 15;

Tables with a high dead_tuple_ratio and a stale last_autovacuum timestamp are falling behind. Compare ratios across tables -- if certain tables consistently show higher dead tuple ratios, they need per-table tuning, not global changes.

Worker Saturation

-- Are all workers busy? If count = autovacuum_max_workers, you're starving
SELECT count(*) AS active_autovacuum_workers
FROM pg_stat_activity
WHERE query LIKE 'autovacuum:%';

-- What are the workers doing right now?
SELECT pid, query, now() - query_start AS duration
FROM pg_stat_activity
WHERE query LIKE 'autovacuum:%'
ORDER BY query_start;

If you consistently see all 3 workers occupied, tables are queuing for vacuum. Every minute a table waits is more dead tuples accumulating.

Hidden Overrides

Check whether someone previously disabled autovacuum on specific tables:

SELECT relname, reloptions
FROM pg_class
WHERE reloptions IS NOT NULL AND relkind = 'r';

Continuous Monitoring

Point-in-time queries show the current state, but effective autovacuum tuning requires trend data. Monitor these metrics over time:

  • Dead tuple counts per table -- are they growing between vacuum runs?
  • Vacuum frequency and duration -- is the same table taking longer to vacuum each time? That's a sign of compounding bloat.
  • Worker utilization -- how often are all workers occupied?
  • I/O throughput -- is vacuum being throttled by conservative cost settings?

Vacuum completion log showing autovacuum events with timestamps, durations, tuples removed, and I/O rates

PostgreSQL logs every completed autovacuum with duration, tuples removed, pages processed, and buffer statistics. Parse these logs systematically -- they contain everything you need to identify tables where vacuum is falling behind.

Health check scoring across domains highlighting Storage and Vacuum scores

A health-check approach that scores vacuum health independently from other domains (indexes, security, WAL) helps you isolate autovacuum as the specific area needing attention rather than chasing general performance issues.

Fixing It: Per-Table Tuning

The most impactful fix is per-table autovacuum settings. High-churn tables need aggressive configuration:

-- Aggressive autovacuum for a high-update table
ALTER TABLE sim_events SET (
    autovacuum_vacuum_scale_factor = 0.02,    -- trigger at 2% dead (default: 20%)
    autovacuum_vacuum_threshold = 500,         -- minimum dead tuples before vacuum
    autovacuum_vacuum_cost_delay = 2,          -- less throttling (default: 2ms in PG14+)
    autovacuum_vacuum_cost_limit = 1000        -- higher I/O budget per cycle (default: 200)
);

-- Verify the per-table settings were applied
SELECT relname, reloptions
FROM pg_class
WHERE relname = 'sim_events';

Dropping scale_factor from 0.2 to 0.02 means vacuum triggers 10x earlier. Dead tuples stay under control. Bloat doesn't compound. The cost settings (cost_delay and cost_limit) control I/O throttling -- a higher limit lets vacuum finish faster at the expense of more I/O bandwidth.

More Workers

If all 3 workers are consistently busy:

autovacuum_max_workers = 5   -- requires restart

Five workers is a good starting point for databases with 20+ actively updated tables. Important: each worker shares the global autovacuum_vacuum_cost_limit by default, so more workers without a higher cost limit may not help. Per-table cost limits (as shown above) give each worker an independent I/O budget.

HOT Updates with fillfactor

For tables receiving thousands of updates per second, combine aggressive autovacuum with fillfactor:

ALTER TABLE sim_events SET (fillfactor = 80);

With fillfactor = 80, PostgreSQL reserves 20% free space per page. When an update doesn't change indexed columns, PostgreSQL writes the new tuple version in the same page (a HOT update). This skips index maintenance entirely and makes vacuum significantly cheaper.

Preventing Future Problems

Build a tiered tuning strategy based on update frequency:

TierUpdate PatternScale FactorCost Limit
High-churnHundreds of updates/second0.01 - 0.051000
Medium-churnPeriodic batch updates0.10Default (200)
Low-churnMostly inserts, rare updates0.20 (default)Default

This categorization isn't static. Review it after schema changes, new feature launches, or workload shifts. A table that was low-churn can become high-churn overnight -- and the autovacuum settings that worked yesterday become inadequate.

Make autovacuum monitoring part of your deployment checklist. After every major release, check:

  1. Dead tuple trends on your busiest tables
  2. Vacuum duration -- is it increasing for the same tables?
  3. Worker utilization -- are all workers frequently occupied?
  4. Any new tables that have become high-churn

Catching a newly problematic table in the first week is straightforward. Catching it after months of accumulated bloat requires a VACUUM FULL and a maintenance window.


Originally published at mydba.dev/blog/autovacuum-tuning