webdev.complete
🔤 Data Formats & Regex
🛠️Dev Toolbelt
Lesson 52 of 117
20 min

JSON, YAML, TOML

When to use which. The Norway problem. JSON pitfalls.

Configuration files and data exchange. Three popular formats, each with its own personality and traps. By the end of this lesson you'll know which to pick for which job, and where each one will bite you.

JSON: the lingua franca

JSON is what every HTTP API speaks. It's a strict subset of JS object syntax: objects, arrays, strings, numbers, booleans, null. No functions, no dates, no undefined.

user.json
{
  "name": "Ada",
  "age": 36,
  "isAdmin": true,
  "projects": ["compiler", "analytical-engine"],
  "address": null
}

The unforgiving rules

  • Keys must be double-quoted strings. {name: "Ada"} is invalid JSON.
  • No trailing commas. [1, 2, 3,] breaks the parser.
  • No comments. Not even // or /* */.
  • No single quotes. Only "double".
  • Numbers are IEEE 754 doubles. Anything over 2^53loses precision. There's no BigInt.
Don't roundtrip Date objects through JSON
JSON.stringify(new Date()) produces an ISO string like "2024-05-24T14:00:00.000Z". After JSON.parse, you get a string back, not a Date. You have to deserialize manually.

JSON gotchas in JS

js
// large numbers lose precision
JSON.parse('{"id":9007199254740993}')
// { id: 9007199254740992 }   <-- not the 993 you sent!

// undefined drops silently; functions disappear
JSON.stringify({ a: undefined, b: () => 1, c: 2 })
// '{"c":2}'

// Map, Set, BigInt all break
JSON.stringify(new Map([["a", 1]]))     // "{}"
JSON.stringify(new Set([1, 2]))         // "{}"
JSON.stringify(10n)                     // throws TypeError

// circular references throw
const x = {};
x.self = x;
JSON.stringify(x)        // TypeError: Converting circular structure

JSON5 and JSONC: the "just let me have comments" siblings

Because everyone hated JSON's rules for config files, two variants emerged:

  • JSONC = JSON with Comments. Adds // and /* */. VS Code's settings.json and tsconfig.json use it.
  • JSON5 = JSONC plus unquoted keys, single quotes, trailing commas, hex numbers, multi-line strings. Closer to actual JS.
tsconfig.json (JSONC)
{
  // Compile target - bump when you drop old browsers
  "compilerOptions": {
    "target": "ES2022",
    "strict": true,
    /* multi-line
       block comment */
    "moduleResolution": "bundler"
  }
}

These are tooling-only. Don't send JSONC over the wire to an HTTP client expecting JSON. Anywhere strict JSON is required (REST APIs, JSON.parse at runtime), you must strip comments first.

YAML: human-friendly, footgun-rich

YAML uses indentation instead of braces. It's the format for Docker Compose, GitHub Actions, Kubernetes, Ansible, and a thousand CI configs. It's easy to read, easy to write, and easy to get wrong.

github-actions.yml
name: CI
on:
  push:
    branches: [main]
  pull_request: {}

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install
        run: npm ci
      - name: Test
        run: npm test
        env:
          NODE_ENV: test

The Norway problem

YAML 1.1 (which a lot of tools still use) has a famous wart. The following config:

yaml
country_codes:
  - US
  - UK
  - NO
  - SE

Looks fine, right? Wrong. YAML 1.1 parses unquoted NO as the boolean false. Also: YES, NO, ON, OFF, Y, N. So your list of country codes silently becomes ["US", "UK", false, "SE"]. This has crashed real production systems.

yaml
# fix: always quote strings that could look booleany
country_codes:
  - "US"
  - "UK"
  - "NO"
  - "SE"
Other YAML traps
  • Tabs are illegal. Mix tabs and spaces and the parser explodes.
  • 1.0 is a number. "1.0" is a string. Version strings need quotes.
  • null, ~, empty value, and the key with no colon mean different things in different YAML versions.
  • Anchors (&name) and aliases (*name) allow includes that can blow up exponentially. Real CVEs have exploited this.

TOML: configuration, done right

TOML was designed specifically to be a config language: less ambiguous than YAML, more readable than JSON, with proper types. Rust's Cargo.toml, Python's pyproject.toml, and more recently some Node tooling use it.

pyproject.toml
[project]
name = "myapp"
version = "1.0.0"
authors = ["Ada <ada@example.com>"]
license = "MIT"

[project.dependencies]
fastapi = "^0.115.0"
pydantic = "^2.0"

[tool.ruff]
line-length = 100
target-version = "py312"

# arrays and inline tables
keywords = ["http", "api", "framework"]
contact = { name = "Ada", email = "ada@example.com" }

# dates and times are first-class
created = 2024-05-24T14:30:00Z

Compared to YAML, TOML's syntax is closer to INI files. No indentation games, comments are #, types are explicit, and dates work without quotes.

CSV: simpler than it looks, more dangerous than it seems

Comma-separated values. The format everyone thinks they understand until they hit a field that contains a comma.

users.csv
name,email,city
Ada Lovelace,ada@example.com,London
"Smith, John",john@example.com,New York
"O'Connor","sean@example.com",Dublin
"He said ""hi""","quote@example.com","Quoted, City"
  • Fields containing commas, quotes, or newlines must be wrapped in double quotes.
  • Literal double quotes inside a quoted field are doubled: "".
  • Different programs use different separators (, in the US, ; in much of Europe because , is the decimal separator).
  • No types. Everything is a string. Excel will silently turn 2024-05-24 into a date, and gene names like MARCH1 into March 1 of the current year. Real bug, real published genomics papers.
Don't write a CSV parser
Just use a library: csv-parse for Node, pandas for Python. The edge cases will eat you.

Which to use when

  • HTTP API payloads → JSON. No exceptions, this is what the world speaks.
  • App / library config → JSONC if your tool supports it (TypeScript, VS Code). TOML for richer needs (project metadata, packaging).
  • CI/CD, infrastructure → YAML, because everyone else uses it. Quote your strings.
  • Tabular data exchange with non-devs → CSV, with a robust parser on the other side.

Side-by-side: the same data, four ways

JSON
{
  "name": "myapp",
  "version": "1.0.0",
  "deps": {
    "react": "^18.0.0",
    "next": "^15.0.0"
  }
}
YAML
name: myapp
version: "1.0.0"     # quoted to keep it a string
deps:
  react: "^18.0.0"
  next: "^15.0.0"
TOML
name = "myapp"
version = "1.0.0"

[deps]
react = "^18.0.0"
next  = "^15.0.0"

Quick quiz

Quiz1 / 4

Which of these is valid JSON?

Recap

  • JSON: strict, no comments, no trailing commas, no BigInt. Universal for APIs.
  • JSONC / JSON5: JSON for humans. Use in config files where the tool supports it.
  • YAML: readable but trap-laden. Quote your strings. Beware the Norway problem.
  • TOML: explicit, typed, designed for config. Great for project metadata.
  • CSV: simpler than it looks until quotes, commas, and Excel get involved. Always use a parser.
Built with Next.js, Tailwind & Sandpack.
Learn. Build. Ship.