Skip to main content

Command Palette

Search for a command to run...

PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

A practical guide for PostgreSQL and TimescaleDB

Published
4 min read
PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

PostgreSQL JSONB Indexing: GIN, Expression & Partial Index Strategies

Last week I reviewed an application where a 5-million-row events table with a JSONB metadata column had a GIN index, a 250ms query SLA, and sequential scans on every single JSONB query. The developer had done the "right thing" -- added a GIN index -- but every query used WHERE metadata->>'status' = 'active', which the GIN index cannot accelerate. The operator and the index type were mismatched, and nobody realized it because the query still returned correct results. It was just catastrophically slow. This mismatch is the single most common JSONB performance mistake in PostgreSQL.

The Operator-Index Mismatch

JSONB supports multiple index types, and each supports different operators:

  • GIN (jsonb_ops): supports @>, ?, ?|, ?& -- NOT ->> equality
  • GIN (jsonb_path_ops): supports only @> -- smaller and faster, but less flexible
  • Expression index (B-tree): supports =, <, >, BETWEEN on a specific extracted key
  • Partial index: combines any of the above with a WHERE clause to index a subset of rows

The most common mistake: creating a GIN index and writing queries with ->>. They don't work together.

Finding Unindexed JSONB Queries

Identify JSONB columns causing sequential scans:

SELECT
    t.schemaname,
    t.relname AS table_name,
    a.attname AS column_name,
    pg_size_pretty(pg_relation_size(t.relid)) AS table_size,
    t.seq_scan,
    t.seq_tup_read
FROM pg_stat_user_tables t
JOIN pg_attribute a ON a.attrelid = t.relid
JOIN pg_type ty ON ty.oid = a.atttypid
WHERE ty.typname = 'jsonb'
  AND a.attnum > 0
  AND NOT a.attisdropped
  AND t.seq_scan > 100
ORDER BY t.seq_tup_read DESC;

Confirm the mismatch with EXPLAIN:

-- This will NOT use a GIN index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events WHERE metadata->>'status' = 'active';

-- This WILL use a GIN index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events WHERE metadata @> '{"status": "active"}';

See Seq Scan on the first query despite having a GIN index? That confirms the operator mismatch.

Three Indexing Strategies

GIN for Containment Queries

When you query with @> containment, GIN is the natural fit:

-- Default: supports @>, ?, ?|, ?&
CREATE INDEX CONCURRENTLY idx_events_metadata_gin
    ON events USING gin (metadata);

-- jsonb_path_ops: only @>, but 2-3x smaller
CREATE INDEX CONCURRENTLY idx_events_metadata_pathops
    ON events USING gin (metadata jsonb_path_ops);

Prefer jsonb_path_ops when you only need containment. It hashes full paths instead of indexing every key and value separately. On tables with complex documents, the size difference is 3-4x.

To use GIN with queries that currently use ->>, rewrite the query:

-- Before (sequential scan)
SELECT * FROM events WHERE metadata->>'status' = 'active';

-- After (uses GIN index)
SELECT * FROM events WHERE metadata @> '{"status": "active"}';

Expression Index for Known Key Lookups

When you repeatedly query a single, known key, a B-tree expression index is more efficient than GIN:

CREATE INDEX CONCURRENTLY idx_events_status
    ON events ((metadata->>'status'));

This indexes one value per row (not the entire document), supports range queries and ORDER BY, and is cheaper to maintain on writes. It's the right choice when you know exactly which keys your application queries.

For nested keys:

CREATE INDEX ON events ((metadata->'address'->>'city'));

For JSONB arrays:

CREATE INDEX CONCURRENTLY idx_events_tags_gin
    ON events USING gin ((metadata->'tags'));

SELECT * FROM events WHERE metadata->'tags' @> '"important"';

Partial Index for Selective Queries

When only a small fraction of rows match your typical query, index only those rows:

-- Index only active events (skip the 90% that are archived)
CREATE INDEX CONCURRENTLY idx_events_active_metadata
    ON events USING gin (metadata jsonb_path_ops)
    WHERE metadata->>'status' = 'active';

-- Expression index on user_id, only for purchase events
CREATE INDEX CONCURRENTLY idx_events_purchase_user
    ON events ((metadata->>'user_id'))
    WHERE metadata->>'type' = 'purchase';

Partial indexes are dramatically smaller and cheaper to maintain. Writes to non-matching rows skip the index entirely.

Choosing the Right Strategy

Query PatternBest IndexExample
@> containmentGIN (jsonb_path_ops)WHERE data @> '{"k": "v"}'
->> equality on known keyExpression (B-tree)WHERE data->>'status' = 'x'
Key existence (?)GIN (jsonb_ops)WHERE data ? 'email'
Range on extracted valueExpression (B-tree)WHERE (data->>'score')::int > 90
Array containmentGIN on sub-pathWHERE data->'tags' @> '"x"'

Watch the Write Cost

GIN maintenance is expensive. Every INSERT or UPDATE touching the JSONB column decomposes the entire document to update the index. On write-heavy tables with large documents, this can reduce insert throughput by 30-50%.

If you query only 2-3 keys, expression indexes on those keys are dramatically cheaper. Track insert latency before and after adding GIN indexes -- a significant slowdown means you should switch to targeted expression or partial indexes.

Prevention

Plan your JSONB indexing strategy when you add the column, not after performance degrades. Document which keys will be queried and how.

Monitor GIN index size relative to table size. If the index approaches the table size, you're over-indexing.

Review JSONB query patterns periodically. As features evolve, the keys your application queries change. Unused indexes waste space and slow writes.

After creating any JSONB index, always verify with EXPLAIN (ANALYZE, BUFFERS). Look for Bitmap Index Scan or Index Scan replacing Seq Scan, and confirm buffer reads dropped significantly.


Originally published at mydba.dev/blog/postgres-jsonb-indexing