🟢The Backend

Lesson 72 of 117

25 min

Indexes & Performance

B-tree, GIN, EXPLAIN ANALYZE, N+1, the cost of writes.

Your app feels fast with 50 rows and grinds to a halt at 50,000. The difference between "fast" and "slow" is almost always indexes. An index is a data structure that lets Postgres find rows without scanning the whole table. Get the right ones and queries run in microseconds. Get them wrong (or skip them) and your customers time out. Let's learn what they are, when they help, and the cost of having too many.

What is an index, really?

Picture the index at the back of a textbook. Without it, finding the chapter that mentions "polymorphism" means flipping through every page. With it, you jump to P, scan three entries, get a page number, go straight there.

A database index is the same idea. Postgres's default index type is a B-tree- a balanced tree that lets you find any value in O(log n) time. For a million rows, that's about 20 comparisons instead of a million.

sql

-- The PRIMARY KEY automatically gets a B-tree index
CREATE TABLE users (
  id    UUID PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,    -- UNIQUE also creates an index
  name  TEXT NOT NULL
);

-- Explicit index on a non-unique column
CREATE INDEX users_name_idx ON users (name);

-- Composite (multi-column) index
CREATE INDEX orders_user_status_idx ON orders (user_id, status);

-- Partial index: only index rows matching a condition
CREATE INDEX orders_pending_idx ON orders (created_at)
  WHERE status = 'pending';

When indexes help

B-tree indexes shine for:

Equality: WHERE email = 'a@b.com'
Range: WHERE created_at > now() - INTERVAL '7 days'
Sorting: ORDER BY created_at DESC LIMIT 20
Prefix matching: WHERE name LIKE 'Ada%'(suffix wildcards don't help)
Joins on indexed columns - the join becomes an index lookup instead of a hash table.

Indexes that won't help

WHERE name LIKE '%Ada%' - leading wildcard kills the B-tree (need a GIN index for full-text).
WHERE LOWER(email) = '...' - function on the column. Either index the expression (LOWER(email)), or normalize on write.
Very low-cardinality columns (e.g. a boolean) - Postgres often picks a sequential scan anyway because the index doesn't filter enough rows.

Composite index column order matters

A composite index on (a, b, c) works like a phone book sorted by last name, then first name, then middle initial. You can use it to find:

WHERE a = ? - yes
WHERE a = ? AND b = ? - yes
WHERE a = ? AND b = ? AND c = ? - yes
WHERE b = ? alone - no (sequential scan)
WHERE a = ? AND c = ? - partially (uses a, must scan within)

Rule: put the most-filtered, equality-checked column first. Range columns (created_at > ...) should come last because the index stops being useful for further columns after a range.

sql

-- Good: equality first, range last
CREATE INDEX orders_user_created_idx ON orders (user_id, created_at DESC);

-- This query uses the index fully:
SELECT * FROM orders
WHERE user_id = '...'
ORDER BY created_at DESC
LIMIT 20;

-- Bad: range first
CREATE INDEX orders_created_user_idx ON orders (created_at, user_id);

-- This query can't use the user_id portion of the index:
SELECT * FROM orders
WHERE user_id = '...'
  AND created_at > now() - INTERVAL '7 days';

Every write pays an index cost

Indexes are not free. Every INSERT, UPDATE (of an indexed column), and DELETE must update every relevant index in addition to the table itself. Disk space goes up. Write throughput goes down.

Practical rule:

1-3 carefully chosen indexes per table: usually a win.
10+ indexes per table: you're probably indexing things nobody queries. Audit and drop the unused ones.

sql

-- Find indexes that have never been used (in Postgres)
SELECT schemaname, relname AS table, indexrelname AS index, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY relname;

EXPLAIN ANALYZE: reading a query plan

Postgres tells you exactly how it plans to run a query. Prefix any query with EXPLAIN ANALYZE to see the plan plus actual execution times.

sql

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = '7a2c-...'
ORDER BY created_at DESC
LIMIT 20;

A bad plan (no index):

text

Limit  (cost=85423.91..85423.96 rows=20 width=78) (actual time=412.5..412.6 rows=20 loops=1)
  ->  Sort  (cost=85423.91..86173.91 rows=300000 width=78) (actual time=412.5..412.5 rows=20 loops=1)
        Sort Key: created_at DESC
        ->  Seq Scan on orders  (cost=0.00..76423.00 rows=300000 width=78) (actual time=0.5..210.1 rows=300000 loops=1)
              Filter: (user_id = '7a2c-...')
              Rows Removed by Filter: 1700000
Planning Time: 0.5 ms
Execution Time: 413.1 ms

Translation: scanned 2 million rows, threw away 1.7 million, sorted what was left, then took 20. 413ms for one request.

Now add the right index:

sql

CREATE INDEX orders_user_created_idx ON orders (user_id, created_at DESC);

text

Limit  (cost=0.42..1.12 rows=20 width=78) (actual time=0.04..0.16 rows=20 loops=1)
  ->  Index Scan using orders_user_created_idx on orders (cost=0.42..1041.42 rows=300000 width=78)
        Index Cond: (user_id = '7a2c-...')
Planning Time: 0.3 ms
Execution Time: 0.2 ms

413ms to 0.2ms. Same query, same data, different access path. This is the entire game.

What to look for in a plan

Seq Scan on a big table = probably needs an index.
Index Scan or Bitmap Index Scan = good, the index is being used.
Rows Removed by Filter = how many extra rows Postgres read just to throw away. High number = wrong index.
actual time in the outermost line tells you the real wall-clock cost.

The N+1 problem

The most common database performance bug isn't about indexes - it's about how many queries you fire. Picture this code:

// 1 query to fetch users
const users = await db.query("SELECT * FROM users LIMIT 20");

// N more queries (one per user) to fetch their latest order
for (const u of users) {
  u.latestOrder = await db.query(
    "SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 1",
    [u.id]
  );
}
// Total: 1 + 20 = 21 queries

Each query has network latency, planning overhead, and a round-trip. 21 queries for what could be 1 is the "N+1" problem. Picture it:

text

Without fix (N+1):
[ users query   ]----------------- 5ms
                [ orders for u1 ]- 3ms
                                [ orders for u2 ]- 3ms
                                                [ ... ] x 20
Total wall time: ~65ms

With JOIN/IN:
[ users JOIN orders            ]- 7ms
Total wall time: ~7ms

Fix it with a single join or a single WHERE user_id = ANY(...):

sql

-- One query, with the latest order per user via window function
SELECT u.*, o.id AS order_id, o.amount_cents
FROM users u
LEFT JOIN LATERAL (
  SELECT id, amount_cents
  FROM orders
  WHERE user_id = u.id
  ORDER BY created_at DESC
  LIMIT 1
) o ON true
LIMIT 20;

ORMs hide N+1 from you

Prisma, Drizzle, and every other ORM make it easy to write a loop that triggers N+1 silently. Always check your DB query logs in dev. Tools like pino-pretty + DB logging or Prisma Studio + the query metrics view will save you.

Quick wins checklist

Index every foreign key column you join or filter on.
Index columns used in ORDER BY for pagination.
For composite indexes: equality columns first, ranges last.
Use partial indexes (WHERE status = 'pending') when you only query a subset.
Audit unused indexes quarterly via pg_stat_user_indexes.
Always look at EXPLAIN ANALYZE before assuming a query is slow because of the database.

Quiz

Quiz1 / 4

Why doesn't WHERE name LIKE '%Ada%' use a B-tree index?

Recap

A B-tree index turns table scans into O(log n) lookups. Postgres auto-indexes PKs and UNIQUE columns.
Index columns you filter, join, or sort on. Use composite indexes with equality first, ranges last.
Every write pays a cost on every index. Don't hoard them - audit pg_stat_user_indexes for unused ones.
EXPLAIN ANALYZE is your microscope. Look for Seq Scan on big tables and high Rows Removed by Filter values.
N+1 is the #1 perf bug. Replace loops of single-row fetches with one JOIN or one IN.