Skip to main content

Command Palette

Search for a command to run...

PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search

A practical guide for PostgreSQL and TimescaleDB

Published
5 min read
PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search

PostgreSQL GIN Indexes: JSONB, Arrays & Full-Text Search

Here's a pattern I see constantly: a team stores flexible data in a JSONB column, writes queries that filter on specific keys, and everything is fast in development. Six months later, with millions of rows in production, those JSONB queries take 3 seconds. They start thinking about moving to Elasticsearch or MongoDB. The actual fix is a single CREATE INDEX statement they've never seen before.

GIN indexes are PostgreSQL's answer to searching inside composite data types. If you work with JSONB, arrays, or full-text search, they're not optional.

What GIN Does Differently

A B-tree index stores one value per row and supports equality, range, and ordering. A GIN (Generalized Inverted Index) decomposes a column value into its constituent parts -- JSON keys, array elements, text lexemes -- and creates an index entry for each part. Think of it like a book's index: instead of one entry per page, you get entries for every term that appears on each page.

This is the only way to index operations like:

  • WHERE metadata @> '{"status": "active"}' (JSONB containment)
  • WHERE tags @> ARRAY['postgresql'] (array containment)
  • WHERE search_vector @@ to_tsquery('performance') (full-text search)
  • WHERE name LIKE '%smith%' (trigram matching)

Without a GIN index, every one of these is a sequential scan.

JSONB Indexes: Choose Your Operator Class

This is where people stumble. There are two GIN operator classes for JSONB:

Default (jsonb_ops) -- supports @>, ?, ?|, ?&:

CREATE INDEX CONCURRENTLY idx_products_attributes_gin
    ON products USING gin (attributes);

-- All of these use the index:
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';
SELECT * FROM products WHERE attributes ? 'warranty';
SELECT * FROM products WHERE attributes ?| array['color', 'size'];

jsonb_path_ops -- supports only @> but is 2-3x smaller and faster:

CREATE INDEX CONCURRENTLY idx_products_attributes_path
    ON products USING gin (attributes jsonb_path_ops);

-- Only containment queries use this index
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';

The trap: you create a jsonb_path_ops index, then write a query using ? (key existence). The index is silently ignored. The query does a seq scan. You spend an hour debugging.

Rule: if you only use @>, choose jsonb_path_ops. If you need ? or ?|, use the default.

Array Indexes

Straightforward -- create the GIN index and all array operators are covered:

CREATE INDEX CONCURRENTLY idx_articles_tags_gin
    ON articles USING gin (tags);

-- Contains all specified elements (@>)
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];

-- Contains any specified element (&&)
SELECT * FROM articles WHERE tags && ARRAY['postgresql', 'mysql'];

-- Is contained by (<@)
SELECT * FROM articles WHERE tags <@ ARRAY['postgresql', 'performance', 'indexing'];

Create a generated tsvector column with weighted sections, then index it:

-- PG12+ generated column
ALTER TABLE articles ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED;

CREATE INDEX CONCURRENTLY idx_articles_search_gin
    ON articles USING gin (search_vector);

-- Search with ranking
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC;

Title matches rank higher than body matches thanks to the weights.

Trigram Indexes: The LIKE '%pattern%' Solution

This is one of the most useful tricks in PostgreSQL. B-tree indexes need a fixed prefix, so LIKE '%smith%' always does a seq scan. The pg_trgm extension fixes this:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX CONCURRENTLY idx_customers_name_trgm
    ON customers USING gin (customer_name gin_trgm_ops);

-- Now these use the index:
SELECT * FROM customers WHERE customer_name LIKE '%smith%';
SELECT * FROM customers WHERE customer_name ILIKE '%john%';

-- Similarity/fuzzy matching too:
SELECT * FROM customers
WHERE customer_name % 'Jon Smith'
ORDER BY similarity(customer_name, 'Jon Smith') DESC;

Finding Tables That Need GIN Indexes

-- Tables with JSONB/array/tsvector columns and high seq scan counts
SELECT
    t.relname AS table_name,
    a.attname AS column_name,
    pg_catalog.format_type(a.atttypid, a.atttypmod) AS data_type,
    t.seq_scan,
    pg_size_pretty(pg_relation_size(t.relid)) AS table_size
FROM pg_stat_user_tables t
JOIN pg_attribute a ON a.attrelid = t.relid
WHERE a.atttypid IN ('jsonb'::regtype, 'json'::regtype)
   OR pg_catalog.format_type(a.atttypid, a.atttypmod) LIKE '%[]'
   OR a.atttypid = 'tsvector'::regtype
ORDER BY t.seq_tup_read DESC;

Confirm with EXPLAIN:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products
WHERE attributes @> '{"color": "blue", "size": "large"}';
-- "Seq Scan" with "Filter" = needs GIN index
-- "Bitmap Index Scan" on GIN index = already optimized

Tuning fastupdate

GIN's pending list batches insertions for efficiency, but can cause unpredictable query latency during flushes:

-- Disable for consistent query latency
ALTER INDEX idx_products_attributes_gin SET (fastupdate = off);

-- Or tune the pending list size
ALTER INDEX idx_products_attributes_gin SET (gin_pending_list_limit = 256);

Tradeoff: disabling fastupdate means slower inserts but consistently fast queries.

GIN Index Size Considerations

GIN indexes are larger than B-tree indexes and grow with the number of distinct indexed values. A JSONB column with highly varied key structures produces a larger GIN index. Monitor sizes over time:

  • Consider jsonb_path_ops for containment-only queries (2-3x smaller)
  • An unused GIN index slows every INSERT and UPDATE with zero query benefit
  • Periodically check that your GIN indexes are actually being used

The Checklist

When adding a column to your schema:

Column TypeQuery PatternIndex Needed
JSONB@>, ?GIN (default or jsonb_path_ops)
Array@>, &&GIN
tsvector@@GIN
Text (wildcard)LIKE '%x%'GIN with gin_trgm_ops

Add the GIN index in the same migration that creates the column. Don't wait for production complaints.