Caching and CDNs: Five Layers to the Database

Redundancy solved the failure problem. Three web servers now share the load. A Virtual IP floats between the two load balancers: the primary holds it and accepts traffic; the standby monitors heartbeats and acquires it on failure. No state passes through the VIP itself. The database has a replica standing by.

But redundancy introduced a new cost. All three web servers serve the same popular pages. When 1,000 users request the homepage, all three web servers query the database for the same content, 1,000 times. The database receives the full query volume regardless of how many web servers are running in parallel.

Redundancy scales availability. It does not reduce database work. Caching does.

A cache stores the result of an expensive operation so that subsequent identical requests can skip the operation entirely. The first request pays the full cost. Every request after that pays almost nothing.

What a Cache Does

A cache is a key-value store. The key is a string that uniquely identifies a request, typically a URL or a query fingerprint. The value is the result of whatever operation that request triggered.

Two outcomes are possible on every cache lookup.

A cache hit occurs when the requested key exists in the cache. The cache returns the stored value immediately. No database query runs.

A cache miss occurs when the key is absent. The application queries the database, stores the result in the cache under that key, and returns the result to the client. The next identical request finds a hit.

The ratio of hits to total requests is the hit rate. A hit rate of 90% means nine out of ten requests never reach the database. That compression is what makes caching effective at scale.

Cache Layers

Caching is not a single component. It is a set of overlapping layers, each positioned differently in the request path.

Browser cache. The client stores HTTP responses locally. The server controls caching behavior through the Cache-Control and Expires response headers. A Cache-Control: max-age=3600 header tells the browser to reuse the stored response for one hour without contacting the server. A cache hit at this layer means zero network traffic. Nothing leaves the user's device. The limitation is irreversibility: there is no mechanism to reach into a user's browser and delete a stale entry. The entry persists until the browser expires it or the user clears the cache.

Edge cache (CDN). A CDN (Content Delivery Network) is a global network of edge servers positioned geographically close to users. When a user in Tokyo requests a JavaScript file that originates from a server in Frankfurt, the CDN serves that file from the nearest edge node rather than routing the request across the Atlantic. Static assets (images, CSS, JavaScript bundles, fonts) are the natural candidates. A CDN cache hit eliminates the full round-trip to the origin server and offloads request volume from the application entirely. A popular image served to a million users hits the CDN a million times and the origin server zero times.

HTTP reverse proxy cache (Varnish). An HTTP reverse proxy cache sits between the network and the application tier, intercepting HTTP requests before they reach any web server. The concept is transparent HTTP caching: the client sends a normal request, the proxy returns a cached response if one exists, or forwards the request to the origin and caches the response for subsequent requests. Varnish is the standard open-source implementation. It sits between the CDN and the load balancer tier. A cache hit returns a full HTTP response from in-memory storage without executing any application code. Cache rules come from standard HTTP headers: Cache-Control and Expires tell Varnish what to store and for how long. VCL (Varnish Configuration Language) overrides those defaults when header-driven behavior is insufficient. One important default: Varnish does not cache any request carrying a Cookie header. Authenticated users set cookies on every request. Without explicit configuration, Varnish skips caching entirely for logged-in traffic.

Application cache (Redis / Memcached). An in-memory key-value store that sits alongside the web server tier. Before every database query, the web server checks the application cache. A hit returns the stored value in microseconds. A miss queries the database, stores the result in the cache, and returns it. Redis is the production standard. Memcached is a simpler alternative with a smaller feature set. This is the cache layer most commonly discussed in system design contexts because it sits directly between the application logic and the database and is the most configurable layer in the stack.

Database query cache. Some databases, including older versions of MySQL, include a built-in cache that stores the results of identical SQL queries. This layer is generally deprecated in favor of application-layer caches because it is harder to control, cannot be shared across multiple database nodes, and is less flexible than Redis. It is worth naming; it is not worth building around.

Cache Invalidation

Storing a result is the easy half of caching. The hard half is deciding when that stored result is no longer valid.

Cache invalidation is the process of removing or replacing stale entries. Phil Karlton's observation captures the difficulty: "There are only two hard things in computer science: cache invalidation and naming things." The difficulty is not technical. It is that the cache and the database must stay consistent, and they can diverge the moment a database write occurs while a cached result still exists.

Three strategies address this problem, each with a different trade-off.

TTL (Time To Live). Every cache entry carries an expiry duration. When that duration elapses, the cache deletes the entry automatically. The next request finds a miss and refreshes the cache from the database. TTL is simple and requires no coordination between the cache and the database. The trade-off is a staleness window: the cache holds an outdated result for up to the full TTL duration after a database write. For content that changes infrequently, this trade-off is acceptable.

Write-through. Every database write simultaneously updates the cache. The application writes to both targets before returning. The cache always holds the current value. The trade-off is latency: every write now pays the cost of two operations. For write-heavy workloads, this cost accumulates.

Write-back (write-behind). The application writes to the cache first and returns immediately. A background process flushes the cached write to the database asynchronously. Writes are fast; the database is not in the critical path. The trade-off is durability risk: if the cache node fails before the flush completes, the write is lost. Write-back is appropriate for workloads where write throughput matters more than strict durability guarantees.

When Caching Creates Problems

A cache introduces three failure modes that do not exist in uncached systems.

Stale data. The cache holds an answer that the database has since updated. A user changes a profile name. The database reflects the change. The cache returns the old name to the next thousand requests until the TTL expires or the cache is explicitly invalidated. TTL bounds how stale the data can be. Write-through eliminates staleness entirely by keeping the cache and database synchronized.

Cache stampede, also called thundering herd. A popular cache entry expires. Multiple requests arrive simultaneously and all find a miss at the same moment. All of them query the database concurrently. The database receives a spike of identical queries that the cache existed to prevent. The stampede is worst when the entry was popular, because popularity means high concurrency, and high concurrency means more simultaneous misses on expiry. Two mitigations exist. Probabilistic early expiration refreshes the entry slightly before the TTL expires rather than exactly at it, spreading the refresh across time. Request coalescing gives one request the exclusive right to query the database on a miss; all other concurrent requests wait for that result. Both approaches reduce the spike without requiring TTL changes.

Memory pressure and eviction. A cache is finite. Redis runs in RAM. When the cache fills, new entries displace old ones. The eviction policy determines which entry is removed. The standard policy is LRU (Least Recently Used): the entry that has not been accessed for the longest time is evicted first. LRU approximates the entries least likely to be requested again. For most workloads it is the correct default.

Takeaways

A cache stores the result of an expensive operation. Subsequent identical requests return the cached value and skip the operation entirely.
Five layers exist: browser cache (zero network traffic on hit, not remotely invalidatable), edge cache/CDN (static assets near users, offloads origin volume), HTTP reverse proxy cache (Varnish: full HTTP responses cached transparently, no application code required), application cache (Redis: arbitrary data objects, requires explicit application integration), database query cache (deprecated, Redis is the replacement).
Cache invalidation is the hard problem. Three strategies: TTL (simple, stale within the window), write-through (always fresh, write latency cost), write-back (fast writes, data-loss risk on failure).
Cache stampede occurs when a popular entry expires and concurrent misses flood the database simultaneously. Probabilistic early expiration and request coalescing are the standard mitigations.
Memory is finite. LRU eviction removes the least-recently-accessed entry when the cache fills.

Caching compresses database load by an order of magnitude or more. The redundant architecture can now absorb large request volumes without every request reaching the database. The next question is how layers of this system communicate with each other. The interface that defines that communication is the API.