Scaling13 min read

How to Scale a SaaS to 1 Million Users Without a Rewrite

Scaling to a million users is not a technology problem. It is a seven-decision problem. Get these right early and you grow without rewrites. Get them wrong and every quarter is a fire.

K
Senior System Architect & Fractional CTO
Published
On this page

Almost every 'we need to rewrite to scale' story I have seen is actually 'we made the same five decisions wrong and never went back to fix them.' Real scaling problems are concentrated. There are roughly seven architectural decisions that decide whether you grow gracefully from 10K to 1M users or whether every quarter feels like an outage.

I have audited and scaled SaaS products from a few hundred users into the millions. The shape of the journey is similar enough across companies to write down. This post is the playbook I run with founders who already have product-market fit and want to make sure their stack survives the next 100x.

The good news: in 2026, you do not need exotic infrastructure. Postgres, Redis, a queue, a CDN, and Sentry will get you to a million users on a six-figure cloud bill. The bad news: the order in which you adopt them, and the discipline you maintain, decides everything.

Decision 1: Database read scaling — replicas before sharding

Almost everything breaks at the database first. The first signal is read latency creeping up under load. The wrong response is to start sharding. The right response is to add a read replica.

On Postgres, set up a streaming replica with a connection pooler in front (PgBouncer or Supabase's built-in pooler). Route analytical and read-heavy queries — search, list endpoints, dashboards — to the replica. Keep transactional writes on the primary. This single move buys you 3 to 5x headroom for the cost of one extra database node.

Sharding (splitting data across multiple databases by tenant ID or shard key) is a year of engineering work. It is correct only when you have exhausted vertical scaling on the primary AND you have a tenant model that shards naturally. For most SaaS, that point arrives at 5 to 10 million users, not 1 million.

Decision 2: Caching strategy — at the right layer

Caching is where most teams either over-engineer or under-engineer, and rarely land in the middle. The right approach is layered:

  • Edge cache (Cloudflare, Fastly, or Vercel) for anonymous content. Marketing pages, public profiles, blog posts — all served from the CDN edge with cache-control headers, never hitting your origin.
  • In-process cache (a Map, lru-cache, or framework-provided memoization) for hot reads inside a single request. Cheapest cache there is.
  • Redis for cross-request hot keys, session data, rate limit counters, and computed values that are expensive to derive but small to store.
  • Postgres materialized views for slow analytical queries that refresh hourly or daily. They cost almost nothing and avoid Redis invalidation hell.

What to avoid: caching in your domain logic. Every time you write 'if not cached, compute and cache,' you are adding an invalidation problem. Push caching to the edge of the system (HTTP layer, infrastructure layer) where invalidation is bounded. Most cache bugs are invalidation bugs, and most invalidation bugs come from caching deep inside the application.

Decision 3: Queues from day one

Any work that takes more than 500 milliseconds and does not strictly need to complete inside the user's request should be a queued job. Email sending, webhook delivery, image processing, AI calls, third-party API calls, exports — all of it. The cost of introducing a queue on day one is one library and a worker process. The cost of retrofitting one at 100K users is a quarter of engineering time and at least one production incident.

In 2026, the right defaults: BullMQ on Redis if you are in Node, Sidekiq on Redis if you are in Ruby, SQS or Cloud Tasks if you are deep on AWS or GCP. Resist the urge to use Kafka for queue work — Kafka is a log, not a queue, and the operational cost is not worth it until you actually have stream processing requirements.

Three rules for queue hygiene:

  1. Every job must be idempotent. Assume it will run twice. Wrap it in 'check if already done' logic.
  2. Every job must have a dead-letter queue and an alert on it. Silent failures here are how you lose customer data.
  3. Every job must have a max retry count and a backoff. Infinite retries are how you DDoS yourself.

Decision 4: Observability before launch, not after

Pre-launch observability stack costs one engineer one day and saves you weeks of debugging post-launch. The minimum viable setup in 2026:

  • Sentry for error tracking. Free tier covers most pre-revenue startups.
  • Structured logging (Pino, Winston) writing JSON to stdout, shipped to a log aggregator (Better Stack, Logtail, Datadog, or even Vercel's built-in viewer).
  • One uptime monitor (Better Stack, Checkly, or BetterUptime) hitting a /health endpoint every minute.
  • Basic metrics: request rate, error rate, p95 latency, queue depth, database connection count. Grafana on top of Postgres or Prometheus is fine.

What you do not need pre-1M users: Datadog APM, distributed tracing across 12 services, custom Prometheus dashboards. Those are 5-engineer-team problems. At smaller scale, Sentry plus structured logs covers 95 percent of incidents.

Decision 5: CDN and edge — push compute outward

Every byte you serve from your origin is a byte your origin pays for in CPU, bandwidth, and latency. By 1M users, anything that can be edge-served must be edge-served.

  • Static assets (JS, CSS, images, fonts): edge-cached, immutable, 1-year cache headers, fingerprinted filenames
  • Anonymous pages (marketing, public content): cached at the edge with stale-while-revalidate
  • Personalized pages: rendered at the edge if possible (Vercel Edge Functions, Cloudflare Workers), with KV at the edge for low-latency reads
  • API endpoints used by anonymous traffic: edge-cached with short TTLs

In 2026, Cloudflare is the default for cost, DX, and the Workers/KV/D1 primitives. CloudFront is the right call only if you are deeply integrated with AWS IAM and origin signing. Vercel handles a lot of this transparently if you are on Next.js — use it.

Decision 6: Schema migrations — boring and disciplined

Schema migrations break products at scale. The two failure modes are: a long-running migration that locks a table for hours, or an incompatible migration deployed before the application code is ready.

The discipline that prevents both:

  1. Every migration is forward-only and backward-compatible for at least one deploy. Add a column nullable, deploy, backfill, deploy, then make it required.
  2. No long migrations on the hot path. If a migration touches more than 100K rows, it runs as a background batch job.
  3. Index changes use CREATE INDEX CONCURRENTLY. Never lock a hot table to add an index.
  4. Migrations run in CI, on a copy of production data (anonymized), before they hit prod. Surprises in prod are unacceptable at this scale.

The tooling here is mature: Prisma, Drizzle, Knex, Rails ActiveRecord, Django migrations all support these patterns if you use them deliberately. The bug is almost always cultural, not tooling — engineers writing migrations as an afterthought instead of treating them as production deployments.

Decision 7: Rate limiting and abuse protection

Above 100K users, you will be attacked. Credential stuffing, scraping, abuse of free tiers, spam signups. The cost of not having rate limits is at minimum a higher cloud bill, at maximum a database outage.

Three layers, all required:

  • WAF at the CDN (Cloudflare, AWS WAF) — blocks bot traffic and known bad actors before it reaches your app
  • Application-level rate limiting per user, per IP, per endpoint, using Redis counters. Lower limits on auth endpoints (5 per minute is reasonable for login).
  • Per-tenant limits — every paid tenant has a quota, and free tiers have hard caps on expensive operations (AI calls, exports, search)

What does not matter at 1M

A short list of things that founders worry about way too early:

  • Multi-region deployments. Unless you have explicit latency or compliance requirements, single-region is fine to many millions of users
  • Kubernetes. Vercel, Fly.io, Render, or even Heroku will outperform a self-managed Kubernetes cluster for any team under 20 engineers
  • Microservices. See the previous post — modular monolith wins until at least the 15-engineer mark
  • Custom search infrastructure. Postgres full-text plus pg_trgm covers up to 10M rows easily. Save Elasticsearch for when you genuinely need it
  • GraphQL. REST or RPC works fine. GraphQL is a tax you pay for one specific benefit (typed cross-team API) that does not apply at small scale

The pattern across all seven decisions

Every one of these decisions follows the same pattern: pay a small cost early to avoid a large cost later. Set up Sentry on day one, not after the first outage. Add a queue when you have one async job, not when you have ten. Cache at the edge before your origin gets crushed. Index migrations are concurrent before you have a table that locks for hours.

Founders who scale gracefully are not smarter than founders who rewrite. They just made these seven decisions early and stuck to them. The decisions are mostly boring. That is the point. Boring infrastructure scales. Exotic infrastructure rewrites.

If you want a senior pair of eyes on your stack before the next growth wave, an architecture audit covers all seven of these decisions plus the specifics of your codebase. We map the bottlenecks and write a 90-day remediation plan. Cheaper than a rewrite by an order of magnitude.

Frequently asked questions

At what user count do I need read replicas?

When read traffic on your primary Postgres consistently sits above 60 percent CPU during business hours, or when a single analytics query degrades transactional latency. For most B2B SaaS that is around 10K to 50K active users; for B2C content apps it can hit at 1K.

Should I cache with Redis or with Postgres materialized views?

Both, for different things. Use Postgres materialized views for slow analytical queries that refresh hourly. Use Redis for hot keys, session data, rate limiting, and anything that changes second-to-second. Do not put your primary read path through Redis until you have measured the real bottleneck.

What breaks first as a SaaS scales?

Almost always the database. Specifically, N+1 queries, missing indexes, and unbounded list endpoints with no pagination. Fix those three before you touch infra, queues, or microservices.

Is Cloudflare or AWS CloudFront better for SaaS at scale?

Cloudflare for cost, DX, and the workers/edge primitives. CloudFront if you are deeply on AWS and need tight IAM and origin signing. For most SaaS startups, Cloudflare is the default in 2026.

When should I introduce a queue?

The first time you do work in a request handler that takes more than 500ms or that the user does not need to see complete to feel the action succeed. Email sending, webhook delivery, image processing, AI calls — all queue jobs from day one.

ScalingSaaSPerformance

Related articles

Architecture

Caching Strategy for SaaS: Redis, Memcached, or CDN First?

Most SaaS apps cache wrong. They reach for Redis on day one and skip the CDN that would have served 80 percent of their traffic for free. Here is the layered caching strategy I recommend after auditing 30+ production systems.

11 min readRead
Architecture

Do You Actually Need Kubernetes? (For 95% of Startups: No)

Most startups running Kubernetes do not need it. The cost is not the cluster — it is the senior DevOps salary, the debugging surface, and the founder attention you are spending instead of shipping.

11 min readRead
Architecture

Rate Limiting Your SaaS API: Patterns That Don't Break at Scale

Most SaaS rate limiting fails in one of two ways: too lax (one customer takes you down) or too aggressive (legitimate users get 429s and churn). Here are the patterns that actually hold up at scale, with implementation specifics.

12 min readRead

Want a senior eye on your stack?

If you are scoping an MVP, scaling a SaaS, or staring at an inherited codebase, book a 30-minute call. No pitch deck required.