Skip to main content

Command Palette

Search for a command to run...

PostgreSQL Covering Indexes: Eliminate Heap Fetches with INCLUDE

A practical guide for PostgreSQL and TimescaleDB

Published
4 min read
PostgreSQL Covering Indexes: Eliminate Heap Fetches with INCLUDE

PostgreSQL Covering Indexes: Eliminate Heap Fetches with INCLUDE

I was profiling a dashboard that loaded in 3 seconds. The main query filtered by customer_id and selected customer_name, email, and last_order_date. PostgreSQL found 2,000 matching rows instantly via the B-tree index -- then spent 95% of execution time fetching each row from the heap to get the three display columns. Two thousand random I/O operations. Adding a covering index with INCLUDE (customer_name, email, last_order_date) dropped the query from 280ms to 4ms. The heap was never touched.

The Two-Step Bottleneck

Every standard B-tree index lookup works in two steps:

  1. Index scan: walk the index to find matching row pointers (TIDs)
  2. Heap fetch: retrieve actual row data from the table

Step 1 is fast -- the index is compact and ordered. Step 2 is the bottleneck -- each heap fetch is a random I/O operation on a different page of the table. For queries returning hundreds or thousands of rows, heap fetches dominate the execution time.

A covering index contains all columns the query needs. When every column is in the index, PostgreSQL uses an index-only scan -- reading everything from the index without ever touching the heap. No random I/O, no wasted reads.

INCLUDE: The Right Way (PostgreSQL 11+)

Before INCLUDE, covering indexes required composite indexes:

-- Old approach: sorts on all three columns
CREATE INDEX ON customers (customer_id, customer_name, customer_email);

This works but wastes CPU maintaining sort order on columns nobody searches by.

INCLUDE adds columns to the index leaf pages without including them in the sort key:

-- New approach: sorts only on customer_id
CREATE INDEX CONCURRENTLY idx_customers_covering
    ON customers (customer_id)
    INCLUDE (customer_name, customer_email);

Key columns are for searching (WHERE, ORDER BY, joins). Included columns are for retrieval only -- stored alongside the key but not sorted. Smaller index, better semantics.

Finding Covering Index Opportunities

Check EXPLAIN output for index scans with heap fetches:

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_name, customer_email
FROM customers
WHERE customer_id BETWEEN 1000 AND 2000;

Red flags:

  • Index Scan using idx_customers_id (not Index Only Scan)
  • Heap Fetches: 1000
  • High buffer counts from random heap access

Goal: Index Only Scan with Heap Fetches: 0.

Find system-wide candidates:

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan > 100
ORDER BY idx_tup_fetch DESC
LIMIT 20;

Tables with high idx_tup_fetch relative to idx_scan are prime candidates. Cross-reference with your most frequent queries.

Real-World Example: Recent Orders Dashboard

-- The query
SELECT order_id, customer_name, order_total, order_status
FROM orders
WHERE created_at >= now() - interval '7 days'
ORDER BY created_at DESC
LIMIT 50;

-- The covering index
CREATE INDEX CONCURRENTLY idx_orders_recent_covering
    ON orders (created_at DESC)
    INCLUDE (order_id, customer_name, order_total, order_status);

One index handles WHERE, ORDER BY, LIMIT, and all selected columns. Entirely from the index. Zero heap access.

INCLUDE vs Composite: When to Use Which

-- INCLUDE: extra columns only appear in SELECT
CREATE INDEX ON orders (order_date)
    INCLUDE (customer_name, order_total);

-- Composite: columns appear in WHERE or ORDER BY
CREATE INDEX ON orders (customer_id, order_date);

Use INCLUDE when additional columns are purely for retrieval. Use composite when multiple columns participate in filtering or sorting.

The Critical Vacuum Dependency

Index-only scans work only on pages marked "all-visible" in the visibility map. Vacuum maintains this map. If vacuum falls behind:

  • Pages lose their all-visible flag
  • The planner reverts to regular index scans with heap fetches
  • Your covering index delivers zero benefit
SELECT
    relname,
    n_live_tup,
    n_dead_tup,
    last_autovacuum,
    last_vacuum
FROM pg_stat_user_tables
WHERE relname = 'customers';

Stale last_autovacuum with accumulating dead tuples = covering index regression. Tune autovacuum on tables where covering indexes matter.

Scale Makes It Dramatic

The gap between index scan + heap fetches and index-only scan widens with table size:

  • 1M rows: maybe 5x improvement
  • 100M rows: 10-100x improvement

On a 100M-row table, eliminating heap fetches can reduce query time from hundreds of milliseconds to single-digit milliseconds. The investment in a covering index pays back more the larger the table grows.

Keeping It Working

Keep covering indexes focused -- include only columns your frequent queries select, not every column in the table.

Monitor idx_tup_fetch on important indexes. A sudden increase means index-only scans have regressed.

Review when query patterns change. Adding a column to a SELECT clause silently breaks the index-only scan -- the query still works, but falls back to heap fetches.


Originally published at mydba.dev/blog/postgres-covering-index