Architecture11 min read

Caching Strategy for SaaS: Redis, Memcached, or CDN First?

Most SaaS apps cache wrong. They reach for Redis on day one and skip the CDN that would have served 80 percent of their traffic for free. Here is the layered caching strategy I recommend after auditing 30+ production systems.

Krishan K Agarwal

Senior System Architect & Fractional CTO

Updated May 2026Published Mar 2026

On this page

Every SaaS performance audit I run starts with the same finding: the team spent three weekends wiring up Redis and zero minutes configuring CDN cache headers. They are paying $50 a month to cache 200 MB of data that their CDN would have served for free, and their P99 still spikes during traffic peaks because nothing is cached at the edge.

Caching is not a tool choice. It is a layered strategy with four tiers — CDN, in-process, distributed (Redis/Memcached), and database — and the right call depends on what you are caching, who reads it, and how often it changes. This guide is the framework I use after auditing more than 30 production SaaS systems.

The four cache layers, in order of cost

There are four meaningful cache layers in a modern SaaS stack, and they get progressively more expensive in both dollars and complexity. The rule is simple: cache as close to the user as possible, and only move down a layer when the one above cannot answer the request.

Layer 1 — CDN edge cache (Cloudflare, Vercel, Fastly). Closest to the user, cheapest, often free. P99 typically 20 to 80 ms globally.
Layer 2 — In-process cache (lru-cache, cachetools, Caffeine). Per-instance, sub-millisecond, zero network. Lost on restart.
Layer 3 — Distributed cache (Redis, Memcached, KeyDB, Dragonfly). Shared across instances. P99 typically 1 to 5 ms in-region.
Layer 4 — Database query cache or materialized views. Last resort before hitting cold storage.

Most teams skip layer 1 and over-invest in layer 3. That is the single biggest caching mistake I see at the seed-to-Series-A stage. If you have not read my breakdown of the cheapest observability stack that works, pair this with that one — you cannot cache what you cannot measure.

When CDN caching wins (and how much it actually saves)

CDN caching wins for any GET request where the response is the same for many users or stable for at least a few seconds. That includes marketing pages, blog content, public API responses, OG images, dashboard widgets that read public data, and anything you can key by URL plus a small set of headers.

Concrete numbers from a recent audit: a B2B SaaS serving 4M requests/day moved their public API responses to Cloudflare with a 60-second cache and stale-while-revalidate. Origin requests dropped 82 percent, P99 latency went from 340 ms to 47 ms globally, and their monthly compute bill dropped from $1,100 to $190. Cloudflare cost: $0 — they were already on the free plan.

When in-process cache is the right answer

In-process caching wins when you have hot reads inside a single instance and you do not yet need shared state. Think feature flag evaluation, parsed config, JWT public keys, geoip lookups, schema introspection. These are sub-millisecond reads and the working set is tiny — usually less than 10 MB.

The trap is using in-process cache for anything that needs to be consistent across instances. The day your second container starts, you have two views of reality. User A updates their email, hits container 1, sees the new value. User B (their teammate) hits container 2, still sees the old one for the next 60 seconds. That bug ships to production and you spend a week debugging it.

The cleanest pattern: in-process cache only for data that is either immutable for the process lifetime (config, public keys) or where short staleness is genuinely fine. For everything else, go to layer 3.

Redis vs Memcached vs Dragonfly: the honest comparison

For new projects in 2026, default to Redis (or its faster fork, Dragonfly). Memcached is still excellent at pure key/value but has been functionally obsoleted by Redis for almost every SaaS workload. Dragonfly is a drop-in Redis replacement with better multi-core scaling, worth considering if you are pushing more than 100K ops/sec on a single instance.

Layer	P99 latency	Cost (MVP scale)	Best for
Cloudflare CDN cache	20-80 ms globally	$0 (free plan)	Public GETs, marketing, OG images, public API
In-process LRU	0.01-0.5 ms	$0	Config, JWT keys, parsed schemas, single-instance hot reads
Redis Cloud / Upstash	1-5 ms in-region	$10-50/mo (256 MB to 1 GB)	Sessions, rate limiting, shared cache, queues, locks
Memcached (ElastiCache)	1-3 ms in-region	$15-40/mo	Pure key/value, no persistence needs, legacy stacks
Dragonfly Cloud	0.5-3 ms in-region	$25-60/mo	Redis-compatible, high throughput (>100K ops/sec)
Postgres materialized view	5-30 ms	Included in DB	Aggregations, reports, slow joins refreshed on schedule

Cache layer comparison: typical P99, cost, and best use case for SaaS workloads in 2026

The Redis-vs-Memcached debate is largely settled. Redis gives you sorted sets (perfect for leaderboards and rate limiting), pub/sub (event fan-out), Lua scripting (atomic multi-step ops), persistence (optional but useful), and TTLs that just work. Memcached gives you slightly simpler ops and that is it. Managed Redis pricing is identical or cheaper.

Cache invalidation: the only hard problem

There are three invalidation patterns that work and one that does not. The one that does not work is 'we will remember to invalidate it.' Every team that ships that pattern eventually serves stale data to a paying customer.

Set a TTL on every cache write. No exceptions. A 60-second TTL with no other invalidation logic is better than a perfect invalidation system you forget to call.
Version your cache keys. Instead of users:42, write users:42:v7 where v7 increments on schema change. Old keys age out via TTL — no migration needed.
Emit invalidation events on writes. After a successful database write, publish a message (Redis pub/sub, Postgres NOTIFY, SNS) that triggers cache deletion across instances.
For multi-region setups, use a write-through pattern with a short TTL backstop. Never rely on cross-region pub/sub alone.
Cache the negative case. If a user does not exist, cache that fact for 30 seconds. Otherwise a hot 404 hammers your database.

The versioned-key pattern is underrated. When your User model changes shape, bump the version once in code and every old cache entry becomes immediately ignorable. No flush, no thundering herd, no 'why are users seeing the old field' incident at 2am.

Cache stampede prevention

A stampede happens when a popular cache key expires under load and 5,000 requests simultaneously try to regenerate it, hammering your database. I have seen this take down production three times in the last year alone — it is an underrated outage cause.

Two mechanisms, used together, eliminate it. First, a short Redis lock (SET key value NX PX 5000) around the regeneration step so exactly one process rebuilds the cache while others either wait or serve stale. Second, probabilistic early refresh — when a key is in the last 10 to 20 percent of its TTL, a single request triggers a background regeneration while the others continue serving the cached value.

Anti-patterns I see every audit

After 30+ SaaS architecture audits, the same caching mistakes show up again and again. If your codebase has any of these, fix them before adding more cache.

Caching everything by default. Cached writes, cached mutations, cached personalized data with no user key. Result: users see each other's data. Cache only safe GETs.
No invalidation strategy. Cache writes go in with TTL=null and the team prays. Eventually a customer reports stale data and the only fix is FLUSHALL in production.
Distributed cache for a single-region single-instance app. You added a network hop and a $40/mo bill for nothing. In-process cache with a TTL is faster and free.
Cache key collisions. Using user_id as a key without namespacing means your sessions module and your users module fight over the same slot. Always namespace: users:profile:42, sessions:42.
Caching authenticated responses at the CDN. Suddenly user A gets user B's dashboard. CDN cache only public or per-key responses — and use Vary headers correctly.
Treating Redis as a database. If losing a key means losing user data, it does not belong in Redis. Cache is rebuildable. Source of truth is not.

A pragmatic caching playbook for SaaS at each stage

What you should actually do, by stage, based on what I have seen work across dozens of teams.

Pre-revenue / MVP

CDN cache headers on every public GET (free). In-process LRU for config and hot reads ($0). Skip Redis until you have a reason. Total caching infrastructure cost: $0 to $5/month. This is enough to handle 100K daily requests on a single small server.

Seed to Series A ($1M-$10M ARR)

Add managed Redis (Upstash or Redis Cloud, $10 to $30/month) for sessions, rate limiting, and shared cache. Keep CDN as your first line. Introduce stale-while-revalidate on every dashboard read where 30-second staleness is acceptable. Total: under $50/month.

Series A+ / multi-region

Per-region Redis with read replicas, CDN with custom cache rules per route, materialized views for analytics queries. This is where you start paying real money — $300 to $2,000/month — but only because the alternative (database overload) costs more.

Where to go from here

Caching is one of three architectural decisions that compound or punish you for years. The other two are how you handle multi-tenant data isolation and how you structure rate limiting on your API. If you got value from this post, the multi-tenant SaaS architecture guide and the rate limiting patterns post are the natural follow-ups.

If you want a senior pair of eyes on your specific stack — what to cache, what to skip, where you are leaving 5x latency on the table — that is what my Architecture Audit is for. Three to five days, fixed price, written report with prioritized fixes.

Frequently asked questions

Should a small SaaS use Redis or just an in-memory cache?

Stay in-memory until you run more than one application instance. A Node or Python process with an LRU cache (lru-cache, cachetools) gives you sub-millisecond reads with zero ops. The day you scale horizontally and two instances start serving stale data to the same user, you graduate to Redis. Most early-stage SaaS hit that wall around 200 to 500 paying users.

Is Memcached still worth using in 2026?

Only for pure key/value workloads where you do not need persistence, pub/sub, sorted sets, or Lua scripting. Redis has eaten Memcached's lunch for almost every use case, and managed Redis (Upstash, Redis Cloud) is the same price. Pick Memcached only if you already run it and it works.

How do I prevent cache stampedes?

Two patterns, used together. First, a short distributed lock around the regeneration step so only one process refreshes a hot key. Second, early refresh — when a key is 80 percent through its TTL, regenerate it in the background while still serving the old value. This eliminates the thundering herd that happens when a popular key expires under load.

What is stale-while-revalidate and when should I use it?

SWR serves a cached value past its freshness window while triggering a background refresh. Cloudflare, Vercel, and Next.js all support it natively. Use it for any read where 'a few seconds stale' is acceptable: marketing pages, public API responses, dashboard reads. Do not use it for balance, inventory, or anything a user expects to be live.

How big should my Redis cache actually be?

Smaller than you think. Most SaaS workloads have a working set of 100 to 500 MB even at $1M ARR. A $10/month managed Redis with 256 MB and a sane eviction policy (allkeys-lru) covers more apps than founders expect. Buy more memory only when your hit rate drops below 90 percent and the eviction count climbs.

CachingPerformanceSaaS

Architecture

Rate Limiting Your SaaS API: Patterns That Don't Break at Scale

Most SaaS rate limiting fails in one of two ways: too lax (one customer takes you down) or too aggressive (legitimate users get 429s and churn). Here are the patterns that actually hold up at scale, with implementation specifics.

12 min readRead

Architecture

Multi-Tenant SaaS Architecture: Pool, Bridge, or Silo?

Most B2B SaaS founders agonize over multi-tenant architecture and pick wrong on day one. Here is the honest comparison of pool, bridge, and silo — and why most companies stay in pool forever, with code-level patterns and a real migration path.

13 min readRead

Architecture

AI-Native SaaS Architecture in 2026: Patterns That Actually Work

Putting an LLM in the critical path changes everything: cost accounting, deploy gates, retries, caching, observability. Here is the 2026 reference architecture I run with AI-native startups, with real numbers.

13 min readRead

Want a senior eye on your stack?

If you are scoping an MVP, scaling a SaaS, or staring at an inherited codebase, book a 30-minute call. No pitch deck required.

Book a strategy call See architecture audit

The four cache layers, in order of cost

When CDN caching wins (and how much it actually saves)

When in-process cache is the right answer

Redis vs Memcached vs Dragonfly: the honest comparison

Cache invalidation: the only hard problem

Cache stampede prevention

Anti-patterns I see every audit

A pragmatic caching playbook for SaaS at each stage

Pre-revenue / MVP

Seed to Series A ($1M-$10M ARR)

Series A+ / multi-region

Where to go from here

Frequently asked questions

Related articles

Rate Limiting Your SaaS API: Patterns That Don't Break at Scale

Multi-Tenant SaaS Architecture: Pool, Bridge, or Silo?

AI-Native SaaS Architecture in 2026: Patterns That Actually Work

Want a senior eye on your stack?