JSON, YAML, TOML
When to use which. The Norway problem. JSON pitfalls.
Configuration files and data exchange. Three popular formats, each with its own personality and traps. By the end of this lesson you'll know which to pick for which job, and where each one will bite you.
JSON: the lingua franca
JSON is what every HTTP API speaks. It's a strict subset of JS object syntax: objects, arrays, strings, numbers, booleans, null. No functions, no dates, no undefined.
{
"name": "Ada",
"age": 36,
"isAdmin": true,
"projects": ["compiler", "analytical-engine"],
"address": null
}The unforgiving rules
- Keys must be double-quoted strings.
{name: "Ada"}is invalid JSON. - No trailing commas.
[1, 2, 3,]breaks the parser. - No comments. Not even
//or/* */. - No single quotes. Only
"double". - Numbers are IEEE 754 doubles. Anything over
2^53loses precision. There's no BigInt.
JSON.stringify(new Date()) produces an ISO string like "2024-05-24T14:00:00.000Z". After JSON.parse, you get a string back, not a Date. You have to deserialize manually.JSON gotchas in JS
// large numbers lose precision
JSON.parse('{"id":9007199254740993}')
// { id: 9007199254740992 } <-- not the 993 you sent!
// undefined drops silently; functions disappear
JSON.stringify({ a: undefined, b: () => 1, c: 2 })
// '{"c":2}'
// Map, Set, BigInt all break
JSON.stringify(new Map([["a", 1]])) // "{}"
JSON.stringify(new Set([1, 2])) // "{}"
JSON.stringify(10n) // throws TypeError
// circular references throw
const x = {};
x.self = x;
JSON.stringify(x) // TypeError: Converting circular structureJSON5 and JSONC: the "just let me have comments" siblings
Because everyone hated JSON's rules for config files, two variants emerged:
- JSONC = JSON with Comments. Adds
//and/* */. VS Code'ssettings.jsonandtsconfig.jsonuse it. - JSON5 = JSONC plus unquoted keys, single quotes, trailing commas, hex numbers, multi-line strings. Closer to actual JS.
{
// Compile target - bump when you drop old browsers
"compilerOptions": {
"target": "ES2022",
"strict": true,
/* multi-line
block comment */
"moduleResolution": "bundler"
}
}These are tooling-only. Don't send JSONC over the wire to an HTTP client expecting JSON. Anywhere strict JSON is required (REST APIs, JSON.parse at runtime), you must strip comments first.
YAML: human-friendly, footgun-rich
YAML uses indentation instead of braces. It's the format for Docker Compose, GitHub Actions, Kubernetes, Ansible, and a thousand CI configs. It's easy to read, easy to write, and easy to get wrong.
name: CI
on:
push:
branches: [main]
pull_request: {}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install
run: npm ci
- name: Test
run: npm test
env:
NODE_ENV: testThe Norway problem
YAML 1.1 (which a lot of tools still use) has a famous wart. The following config:
country_codes:
- US
- UK
- NO
- SELooks fine, right? Wrong. YAML 1.1 parses unquoted NO as the boolean false. Also: YES, NO, ON, OFF, Y, N. So your list of country codes silently becomes ["US", "UK", false, "SE"]. This has crashed real production systems.
# fix: always quote strings that could look booleany
country_codes:
- "US"
- "UK"
- "NO"
- "SE"- Tabs are illegal. Mix tabs and spaces and the parser explodes.
1.0is a number."1.0"is a string. Version strings need quotes.null,~, empty value, and the key with no colon mean different things in different YAML versions.- Anchors (
&name) and aliases (*name) allow includes that can blow up exponentially. Real CVEs have exploited this.
TOML: configuration, done right
TOML was designed specifically to be a config language: less ambiguous than YAML, more readable than JSON, with proper types. Rust's Cargo.toml, Python's pyproject.toml, and more recently some Node tooling use it.
[project]
name = "myapp"
version = "1.0.0"
authors = ["Ada <ada@example.com>"]
license = "MIT"
[project.dependencies]
fastapi = "^0.115.0"
pydantic = "^2.0"
[tool.ruff]
line-length = 100
target-version = "py312"
# arrays and inline tables
keywords = ["http", "api", "framework"]
contact = { name = "Ada", email = "ada@example.com" }
# dates and times are first-class
created = 2024-05-24T14:30:00ZCompared to YAML, TOML's syntax is closer to INI files. No indentation games, comments are #, types are explicit, and dates work without quotes.
CSV: simpler than it looks, more dangerous than it seems
Comma-separated values. The format everyone thinks they understand until they hit a field that contains a comma.
name,email,city
Ada Lovelace,ada@example.com,London
"Smith, John",john@example.com,New York
"O'Connor","sean@example.com",Dublin
"He said ""hi""","quote@example.com","Quoted, City"- Fields containing commas, quotes, or newlines must be wrapped in double quotes.
- Literal double quotes inside a quoted field are doubled:
"". - Different programs use different separators (
,in the US,;in much of Europe because,is the decimal separator). - No types. Everything is a string. Excel will silently turn
2024-05-24into a date, and gene names likeMARCH1into March 1 of the current year. Real bug, real published genomics papers.
csv-parse for Node, pandas for Python. The edge cases will eat you.Which to use when
- HTTP API payloads → JSON. No exceptions, this is what the world speaks.
- App / library config → JSONC if your tool supports it (TypeScript, VS Code). TOML for richer needs (project metadata, packaging).
- CI/CD, infrastructure → YAML, because everyone else uses it. Quote your strings.
- Tabular data exchange with non-devs → CSV, with a robust parser on the other side.
Side-by-side: the same data, four ways
{
"name": "myapp",
"version": "1.0.0",
"deps": {
"react": "^18.0.0",
"next": "^15.0.0"
}
}name: myapp
version: "1.0.0" # quoted to keep it a string
deps:
react: "^18.0.0"
next: "^15.0.0"name = "myapp"
version = "1.0.0"
[deps]
react = "^18.0.0"
next = "^15.0.0"Quick quiz
Which of these is valid JSON?
Recap
- JSON: strict, no comments, no trailing commas, no BigInt. Universal for APIs.
- JSONC / JSON5: JSON for humans. Use in config files where the tool supports it.
- YAML: readable but trap-laden. Quote your strings. Beware the Norway problem.
- TOML: explicit, typed, designed for config. Great for project metadata.
- CSV: simpler than it looks until quotes, commas, and Excel get involved. Always use a parser.