🟢The Backend

Lesson 71 of 117

30 min

Joins, Aggregations, Transactions

INNER/LEFT joins, GROUP BY, COUNT, BEGIN/COMMIT.

Knowing single-table SELECTs is like knowing how to drive on an empty parking lot. The real road is multi-table queries: counting orders per user, finding customers with no orders, summing revenue by month, and doing all of it atomically. That's joins, aggregations, and transactions. Let's drive.

The setup

We'll reuse the users and orders tables from the previous lesson:

sql

-- users(id, email, name, created_at)
-- orders(id, user_id, amount_cents, status, created_at)

-- Picture this data:
-- users
-- ┌──────┬───────────────────┬─────────┐
-- │ id   │ email             │ name    │
-- ├──────┼───────────────────┼─────────┤
-- │ u1   │ ada@example.com   │ Ada     │
-- │ u2   │ grace@example.com │ Grace   │
-- │ u3   │ alan@example.com  │ Alan    │  -- no orders
-- └──────┴───────────────────┴─────────┘
--
-- orders
-- ┌──────┬─────────┬──────────────┬─────────┐
-- │ id   │ user_id │ amount_cents │ status  │
-- ├──────┼─────────┼──────────────┼─────────┤
-- │ o1   │ u1      │ 1000         │ paid    │
-- │ o2   │ u1      │ 2500         │ pending │
-- │ o3   │ u2      │ 4999         │ paid    │
-- │ o4   │ NULL    │ 800          │ paid    │  -- orphan guest order
-- └──────┴─────────┴──────────────┴─────────┘

The four joins, demystified

A join glues two tables together row-by-row on a matching condition. The four flavors differ in which rows survive when the match fails on one side.

INNER JOIN: keep only matches on both sides

sql

SELECT u.name, o.amount_cents, o.status
FROM users u
INNER JOIN orders o ON o.user_id = u.id;

-- Result (Alan dropped, guest order dropped):
-- name  | amount_cents | status
-- Ada   | 1000         | paid
-- Ada   | 2500         | pending
-- Grace | 4999         | paid

LEFT JOIN: keep every row from the left table

sql

SELECT u.name, o.amount_cents, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;

-- Result (Alan now appears with NULLs):
-- name  | amount_cents | status
-- Ada   | 1000         | paid
-- Ada   | 2500         | pending
-- Grace | 4999         | paid
-- Alan  | NULL         | NULL

RIGHT JOIN: keep every row from the right table

sql

SELECT u.name, o.amount_cents
FROM users u
RIGHT JOIN orders o ON o.user_id = u.id;

-- Result (guest order appears with NULL user):
-- name  | amount_cents
-- Ada   | 1000
-- Ada   | 2500
-- Grace | 4999
-- NULL  | 800

FULL OUTER JOIN: keep everything from both sides

sql

SELECT u.name, o.amount_cents
FROM users u
FULL OUTER JOIN orders o ON o.user_id = u.id;

-- Result (Alan AND guest both appear):
-- name  | amount_cents
-- Ada   | 1000
-- Ada   | 2500
-- Grace | 4999
-- Alan  | NULL
-- NULL  | 800

The mental shortcut

LEFT JOIN= "every row from this table, plus matches from the other one if any." You'll use LEFT JOIN 80% of the time. RIGHT JOIN is just a LEFT JOIN with the tables swapped - most teams ban it for readability. FULL OUTER is rare.

Aggregations: COUNT, SUM, AVG, MIN, MAX

An aggregate function collapses many rows into one. Without GROUP BY, it collapses the whole table.

sql

SELECT COUNT(*) FROM users;        -- 3
SELECT SUM(amount_cents) FROM orders;  -- 9299
SELECT AVG(amount_cents)::int FROM orders WHERE status = 'paid';
SELECT MIN(created_at), MAX(created_at) FROM orders;

GROUP BY: aggregate per bucket

GROUP BY collapses rows that share the same value(s) into one row, with aggregates applied per group.

sql

-- Orders per user
SELECT user_id, COUNT(*) AS order_count, SUM(amount_cents) AS total
FROM orders
GROUP BY user_id;

-- user_id | order_count | total
-- u1      | 2           | 3500
-- u2      | 1           | 4999
-- NULL    | 1           | 800

Combine with a join to get user names instead of IDs:

sql

SELECT u.name, COUNT(o.id) AS order_count, COALESCE(SUM(o.amount_cents), 0) AS total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY total DESC;

-- name  | order_count | total
-- Grace | 1           | 4999
-- Ada   | 2           | 3500
-- Alan  | 0           | 0       -- LEFT JOIN keeps Alan even with no orders

COALESCE handles NULLs

SUM of zero rows returns NULL. COALESCE(x, 0) returns x unless x is NULL, in which case it returns the fallback. Use it whenever a NULL would surprise a downstream consumer.

HAVING vs WHERE

WHERE filters rows before grouping. HAVING filters groups after. You can't use an aggregate in WHEREbecause the groups don't exist yet.

sql

-- Users who have at least 2 paid orders
SELECT u.name, COUNT(*) AS paid_orders
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid'    -- per-row filter, BEFORE grouping
GROUP BY u.id, u.name
HAVING COUNT(*) >= 2;      -- per-group filter, AFTER grouping

SQL execution order (it's not top-to-bottom)

Even though you write SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT, Postgres logically runs the clauses in a different order:

FROM (load tables)
JOIN (combine them)
WHERE (filter rows)
GROUP BY (bucket rows)
HAVING (filter buckets)
SELECT (pick columns)
ORDER BY (sort)
LIMIT (slice)

That's why a column alias from SELECTisn't available in WHERE but is available in ORDER BY.

Transactions: all or nothing

Some operations only make sense as a single atomic unit. Transferring money is the classic example: subtract from one account, add to another, both or neither. Postgres gives you BEGIN, COMMIT, and ROLLBACK.

sql

BEGIN;

UPDATE accounts SET balance = balance - 1000 WHERE id = 'a1';
UPDATE accounts SET balance = balance + 1000 WHERE id = 'a2';

-- If everything looks right:
COMMIT;

-- If anything failed or looked wrong:
ROLLBACK;

Between BEGIN and COMMIT, your writes are invisible to other connections. They become visible atomically at the moment of COMMIT.

transaction-in-node.ts

import { Pool } from "pg";
const pool = new Pool();

async function transfer(from: string, to: string, cents: number) {
  const client = await pool.connect();
  try {
    await client.query("BEGIN");
    await client.query(
      "UPDATE accounts SET balance = balance - $1 WHERE id = $2",
      [cents, from]
    );
    await client.query(
      "UPDATE accounts SET balance = balance + $1 WHERE id = $2",
      [cents, to]
    );
    await client.query("COMMIT");
  } catch (err) {
    await client.query("ROLLBACK");
    throw err;
  } finally {
    client.release();
  }
}

Always release the client

With a connection pool, leaking connections is a top-5 production Postgres issue. Wrap every transaction in try / finally and client.release() in the finally block.

UPSERT with ON CONFLICT

Need to "insert if missing, otherwise update"? Postgres has INSERT ... ON CONFLICT DO UPDATE, also known as an UPSERT.

sql

INSERT INTO users (email, name)
VALUES ('ada@example.com', 'Ada King')
ON CONFLICT (email) DO UPDATE
  SET name = EXCLUDED.name,
      updated_at = now()
RETURNING id, name;

EXCLUDEDis a virtual row holding the values that would have been inserted - use it to reference "the new value" inside the update.

sql

-- "Idempotent insert": do nothing if the email already exists
INSERT INTO users (email, name)
VALUES ('ada@example.com', 'Ada')
ON CONFLICT (email) DO NOTHING;

The conflict target needs an index

ON CONFLICT (email) only works if email has a unique constraint or unique index. Otherwise Postgres has no way to detect the collision.

Quiz

Quiz1 / 4

You want all users in the result even if they have zero orders. Which join?

Recap

INNER JOIN = matches only. LEFT JOIN = left rows + matches. RIGHT/FULL = the symmetric versions.
Aggregate with COUNT, SUM, AVG, MIN, MAX. Bucket with GROUP BY.
WHERE filters rows before grouping; HAVING filters groups after.
Use BEGIN / COMMIT / ROLLBACK for atomic multi-statement writes. Always release pool clients.
INSERT ... ON CONFLICT gives you UPSERT (update on conflict) and idempotent inserts (do nothing on conflict).