webdev.complete
📨 HTTP Deep Dive
🌐The Web Beneath
Lesson 57 of 117
25 min

Headers, MIME Types, Caching

Content-Type, Cache-Control, ETag, the negotiation dance.

Headers are the metadata of HTTP. Status code tells you what. The body is the thing. Headers are everything else: who you are, what format you can read, how long to cache, what version you have. Caching alone, when you know its headers, can take a slow app and make it feel instant. Let's read the manual.

Content-Type and MIME types

Every response (and most requests with a body) carries a Content-Type header telling the receiver what format the bytes are in. The values are MIME types, which look like type/subtype:

  • text/html; charset=utf-8 - a web page
  • application/json - JSON payload
  • application/javascript - a JS file
  • text/css - a stylesheet
  • image/png, image/jpeg, image/webp - images
  • application/octet-stream - "arbitrary bytes, treat as a download"
  • multipart/form-data; boundary=... - file uploads from a form
Wrong Content-Type breaks things silently
Send JSON without Content-Type: application/json and many clients will refuse to parse it. Send an image with text/plain and the browser will literally render the bytes as text. Always set it.

Accept and content negotiation

The client tells the server what formats it can read with the Accept header. The server picks the best match for the response.

bash
# Browser says: prefer HTML, then anything, then anything
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

# A REST client might say: only JSON, please
Accept: application/json

The q values are quality preferences from 0 to 1. A well-behaved server returns 406 Not Acceptableif it can't satisfy the request.

Authorization

The Authorization header carries credentials. The most common formats:

bash
# Bearer token - JWT, session token, anything string-shaped
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

# Basic auth - base64-encoded user:password. Use only over HTTPS.
Authorization: Basic dXNlcjpwYXNzd29yZA==

# API key in a header (convention, not standardized)
X-API-Key: sk_live_abc123

We'll go deep on auth schemes in the auth chapter. For now, remember: never put credentials in URLs. URLs get logged everywhere.

Caching: the most underrated speedup on the web

Every HTTP request you don't have to make is free. Caching is how you don't make them. The main lever is Cache-Control.

Cache-Control directives

  • max-age=SECONDS - how long the browser can use this response without re-checking. max-age=3600 = 1 hour.
  • s-maxage=SECONDS - same but only for shared caches (CDNs). Browsers ignore it. Lets you tell the CDN to cache for longer than browsers.
  • public - any cache can store this, including intermediaries (CDNs, proxies).
  • private- only the user's browser can cache it. Use this for personalized responses.
  • no-cache - confusingly named. Does cache, but must revalidate with the server before reuse (using ETag, see below).
  • no-store - do not store anywhere, ever. For sensitive data.
  • immutable- "this URL will never change its content." Tells the browser not to revalidate even on hard refresh. Perfect for hashed asset URLs like app.a8f3.js.
  • stale-while-revalidate=SECONDS- "if the cache is fresh, use it. If it just expired within this window, use it anyway and fetch a new one in the background." The stale-while-revalidate window keeps things fast even right after expiration.
Two common caching recipes
For static hashed assets (CSS/JS bundles with a hash in the filename): Cache-Control: public, max-age=31536000, immutable. Cache forever.
For HTML pages: Cache-Control: public, max-age=60, stale-while-revalidate=600. Fresh for a minute, usable for ten more while we refresh.

ETag and If-None-Match: the conditional GET

Sometimes a response is too dynamic to cache by time, but it's still the same most of the time. ETags solve this. An ETag is an opaque token, set by the server, that identifies a specific version of a resource. The client stores it. Next time, it sends it back.

first request
GET /api/articles/42
# server response:
HTTP/1.1 200 OK
ETag: "v17"
Cache-Control: no-cache
Content-Type: application/json

{"title":"Hello","views":1500}
second request
GET /api/articles/42
If-None-Match: "v17"

# if still v17:
HTTP/1.1 304 Not Modified
ETag: "v17"
# (no body - the client reuses its cached copy)

# if changed:
HTTP/1.1 200 OK
ETag: "v18"
Content-Type: application/json

{"title":"Hello","views":1501}

The 304 response is tiny (just headers) and means "your cached copy is current." Big wins on big payloads.

ETags can be strong (byte-for-byte identical) or weak (semantically equivalent, written as W/"v17"). Weak is fine for most APIs. Strong is needed for byte-range requests.

A real curl -v session

curl -v prints the full request and response, headers included. Read this top to bottom. Lines starting with > are what curl sent. Lines starting with < are what the server sent back.

bash
$ curl -v https://api.github.com/repos/microsoft/vscode

*   Trying 140.82.114.6:443...
* Connected to api.github.com (140.82.114.6) port 443
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
> GET /repos/microsoft/vscode HTTP/2
> Host: api.github.com
> user-agent: curl/8.4.0
> accept: */*
>
< HTTP/2 200
< content-type: application/json; charset=utf-8
< cache-control: public, max-age=60, s-maxage=60
< etag: W/"7bca...19c2"
< x-ratelimit-limit: 60
< x-ratelimit-remaining: 58
< vary: Accept, Accept-Encoding
<
{"id":41881900,"name":"vscode","full_name":"microsoft/vscode",...}

Things to notice:

  • HTTP/2 was negotiated during the TLS handshake.
  • The response has an etag. The next curl can send If-None-Match to potentially get a 304.
  • cache-control: public, max-age=60 tells us this can be cached for a minute.
  • vary: Accept, Accept-Encoding tells caches that the response depends on those headers, cache separately per combination.
  • Rate limit headers show you have 58 of 60 requests left this hour.
Try it yourself
Run curl -v https://api.github.com/zen in your terminal. Run it twice. Notice the ETag on the first response, then add -H 'If-None-Match: "..."' on the second and watch for the 304.

Quiz

Quiz1 / 4

Cache-Control: no-cache means what?

Recap

  • Content-Type declares the format. Accept declares what the client can read.
  • Authorization carries credentials. Never put them in URLs.
  • Cache-Control is the main caching dial. max-age, s-maxage, public/ private, no-cache (revalidate), no-store (don't cache), immutable (never changes), stale-while-revalidate (use stale while refreshing).
  • ETag + If-None-Match gives you the cheap 304 Not Modified roundtrip for conditional fetches.
  • curl -v is your microscope. Use it.