Cost11 min read

How to Cut Your AWS Bill by 50% Without Breaking Things

Most AWS bills have 30 to 55 percent fat that comes off without architecture changes or downtime. Here is the audit playbook I run, in priority order, with real numbers from real cuts.

Krishan K Agarwal

Senior System Architect & Fractional CTO

Updated May 2026Published Apr 2026

On this page

Most AWS bills have 30 to 55 percent fat on them. Not 'with enough engineering work' fat — fat that comes off in a quarter, with no downtime, often without any code changes. I have run this exact playbook on bills from $3K a month to $80K a month, and the pattern barely changes.

Below is the playbook in the order I actually run it: highest savings per hour first, riskiest changes last. Numbers are from real audits, names redacted but stacks typical (EC2 plus RDS plus S3 plus CloudFront, often with EKS or Fargate).

Where AWS bills actually bloat

Seven line items account for almost all the waste I find. Before any optimization work, pull a Cost Explorer report grouped by service for the last 90 days and compare against this list. If your top three services do not match the usual suspects, you have an unusual stack and the rest of this playbook needs adjustment.

Idle and oversized EC2 (and ECS/EKS nodes): often 20 to 40 percent of bill, often 30 to 50 percent fat
Oversized or always-on RDS: 10 to 25 percent of bill, with non-prod environments running 24/7
NAT Gateway data processing: silent killer, $0.045 per GB plus $0.045 per hour, easily 5 to 15 percent of total bill on chatty services
Unattached EBS volumes and old snapshots: pure waste, 1 to 5 percent of bill on accounts older than 18 months
CloudWatch Logs with no retention policy: often 2 to 8 percent of bill, almost entirely waste
S3 in standard tier when it should be Intelligent-Tiering or Glacier: 3 to 10 percent of bill
Untagged resources nobody owns: not a line item, but the reason most of the above persists

Step 1: Kill zombies (5-10% savings, half a day)

The cheapest, fastest, lowest-risk wins. Before you touch anything that runs production traffic, run a zombie hunt. The Trusted Advisor 'Idle' and 'Underutilized' checks plus a manual sweep for unattached resources will find the obvious waste in under an hour.

Unattached EBS volumes (gp3 at $0.08/GB-month): list all volumes with state=available, snapshot what you cannot identify, then delete
EBS snapshots older than 90 days that are not part of a backup policy: delete or move to a lifecycle rule
Old AMIs and their backing snapshots: every CI/CD pipeline accumulates these — set a 30-day retention
Unassociated Elastic IPs: $0.005 per hour each, $3.60 per month per zombie. I have seen accounts with 40 of them
Stopped EC2 instances older than 30 days: stopped instances still bill for EBS — terminate or restart
NAT Gateways in dev/staging VPCs that nobody uses on weekends: $32 per month each, just sitting there

Step 2: CloudWatch Logs retention (3-8% savings, 2 hours)

By default, every CloudWatch Log Group keeps logs forever at $0.50 per GB ingested plus $0.03 per GB-month stored. On a busy app, this is genuinely shocking. The fix is one Terraform/Cloud Custodian sweep that sets a default retention on every log group — 30 days for app logs, 90 days for audit logs, longer only for compliance-regulated paths.

Bonus: enable log group level subscription filters to push real long-term logs to S3 (Standard-IA or Glacier Deep Archive) at one-tenth the cost. CloudWatch is fine for hot debugging — it is a terrible long-term log store.

Step 3: Right-size EC2 and RDS (8-15% savings, 1 week)

AWS Compute Optimizer flags every EC2 instance, RDS instance, EBS volume, Lambda, and Auto Scaling Group that is over-provisioned. Most accounts have 30 to 50 percent of their EC2 fleet running at under 20 percent CPU and under 40 percent memory utilization on the wrong instance family.

Standard moves: m5.large to m7g.medium (Graviton, ARM, often 30 to 50 percent cheaper), gp2 to gp3 (always cheaper, often faster, free migration), RDS r5 to r7g (Graviton on RDS — 20 percent off list, drop-in for most workloads), and turning off non-prod databases overnight with Lambda + EventBridge schedules. A non-prod RDS db.r5.large running 24/7 costs $145/month. Running it 8am-8pm on weekdays cuts it to $48.

Action	Typical savings	Effort	Risk	When to do it
Delete unattached EBS volumes	1-3%	Hours	None	Always, first
Set CloudWatch Logs retention	3-8%	Hours	None	Always, first week
Right-size EC2 with Compute Optimizer	8-15%	1 week	Low	Before any commitment
Replace NAT egress with VPC endpoints (S3, DDB)	3-8%	1 day	None — endpoints are free	If NAT is over $200/mo
NAT Instance instead of NAT Gateway (dev only)	2-5%	1-2 days	Medium — you operate it	Dev/staging only, low traffic
Compute Savings Plan, 1-year, no upfront	15-30%	1 day	Low	Once spend is steady-state
Graviton (ARM) migration on EC2/RDS	10-20%	1-2 weeks	Low-medium	After right-sizing
S3 Intelligent-Tiering on >128KB objects	3-8% on S3	1 day	None	Always, on storage-heavy buckets
Spot for batch and CI workloads	60-80% on those	2-3 days	Low — interruption-tolerant	Async/batch jobs
Reserved Instances, 3-year all-upfront	50-65%	1 day	High — locked-in	Rarely; only ultra-stable workloads

AWS cost reduction actions ranked by ROI. Run roughly top-to-bottom for fastest payback.

Step 4: NAT Gateway is probably 5-15% of your bill

NAT Gateway is the silent budget killer. It costs $0.045 per GB processed plus $0.045 per hour ($32.40 per month per AZ, before traffic). And critically, every call from a private subnet to S3, DynamoDB, ECR, Secrets Manager, SSM, KMS, or any AWS service goes through NAT and racks up egress fees — unless you have set up VPC endpoints.

The fix is mechanical. Add a VPC Gateway Endpoint for S3 (free) and DynamoDB (free). Add VPC Interface Endpoints for ECR, Secrets Manager, SSM, KMS, STS, and CloudWatch ($0.01 per hour each per AZ, plus $0.01 per GB processed — much cheaper than NAT). On almost every audit I run, this single change cuts the NAT bill by 50 to 80 percent.

Real example: a Fargate-based product with chatty ECR pulls and a heavy S3 read pattern was paying $1,800 per month in NAT. After adding the S3 Gateway Endpoint and an ECR Interface Endpoint, NAT dropped to $310. The endpoints cost $42 per month. Net savings: $1,448 per month, half a day of work.

Step 5: Savings Plans (15-30% savings, 1 day)

Once steady-state usage is identified (after right-sizing, not before — buying SPs on bloated instances locks in the bloat), buy a Compute Savings Plan at the 1-year no-upfront level on roughly 70 to 80 percent of your baseline compute. This is the no-regret move: 30 to 40 percent off list across EC2, Fargate, and Lambda, applies to whatever instance family you end up on, no cash up front.

Avoid 3-year all-upfront SPs unless you are post-Series B with predictable workloads. The 50 to 65 percent saving is real but the lock-in is unforgiving — if your usage drops 30 percent or you migrate to a new region/family, you eat the unused commitment. RDS Reserved Instances are the same story: only commit on workloads you are sure will exist in 12 months.

Step 6: Graviton migration (10-20% savings, 1-2 weeks)

ARM-based Graviton instances (m7g, c7g, r7g, t4g) are typically 20 percent cheaper than x86 equivalents, with 10 to 40 percent better price-performance on most modern workloads. In 2026, almost every common runtime — Node.js, Python, Go, Java, Rust, .NET — runs natively on ARM, and most container base images are multi-arch.

The migration is mostly mechanical: rebuild your Docker images for linux/arm64 (use buildx), test on a Graviton instance, switch your ASG/ECS/EKS node group to the t4g/m7g family. The exceptions are workloads with x86-only binaries (some legacy ML libraries, some database clients, some commercial software) — test those first.

RDS Graviton (r7g, m7g) is a one-click migration in the console. I have never seen it cause a regression on a standard Postgres or MySQL workload. 20 percent off, no app changes. There is no excuse not to do this.

Step 7: S3 lifecycle and Spot for batch (3-15% savings, 1-3 days)

S3 Intelligent-Tiering automatically moves objects between Frequent, Infrequent, and Archive tiers based on access patterns, for $0.0025 per 1,000 objects per month in monitoring fees. On any bucket with mixed-access content (user uploads, logs, backups), this saves 30 to 60 percent on storage with zero ops burden. Apply Intelligent-Tiering as the default storage class on every bucket where objects are over 128KB.

For Glacier-tier content (compliance archives, old backups), move it explicitly with a lifecycle rule — Glacier Deep Archive is $0.00099 per GB per month, roughly 23x cheaper than Standard. Just set the lifecycle: Standard -> Standard-IA at 30 days -> Glacier Deep Archive at 180 days for anything you keep but never read.

Spot Instances run at 60 to 80 percent off on-demand and are a no-brainer for any workload that can tolerate interruption: CI builds, batch ETL, ML training, async media processing. EC2 Spot via Karpenter (on EKS) or ECS Spot capacity providers makes this almost transparent. The two-minute interruption notice is enough for a well-designed batch job.

Putting it all together: real before/after numbers

Composite of three audits I ran in 2025 on similar-shape startups (Next.js plus Postgres on RDS plus Fargate plus S3, $7K to $9K/month bills). The savings stack roughly multiplies, with diminishing returns past the 50 percent mark.

Run zombie sweep — delete unattached EBS, EIPs, old snapshots, idle NAT in dev
Set CloudWatch Logs retention to 30 days, push compliance logs to S3
Run AWS Compute Optimizer, right-size every flagged instance and EBS volume
Add VPC endpoints for S3, DynamoDB, ECR, Secrets Manager, SSM
Migrate gp2 EBS volumes to gp3 (always cheaper, often faster)
Buy 1-year no-upfront Compute Savings Plan covering 70-80 percent of baseline
Migrate EC2 fleet and RDS to Graviton (m7g/r7g/t4g)
Apply S3 Intelligent-Tiering as bucket default, set Glacier lifecycle on archive paths
Move CI and batch jobs to Spot capacity
Re-tag everything for ownership and run monthly cost-by-team reviews

Bill before: $8,200/month. Bill 90 days after: $3,650/month. Cut: 55 percent. Engineering effort: roughly 4 weeks of one senior engineer's time, spread across the quarter. ROI: more than 10x in year one.

If your bill is north of $5K/month and you have not run this playbook, it is the highest-leverage engineering work you can do this quarter. The architecture audit is built for exactly this — for related cost work, the cost-per-user post and the DevOps-on-$500-a-month guide pair well, and if you are still on $0 to $1K/month, the budget DevOps stack is probably more relevant than this AWS playbook.

Frequently asked questions

How much can I realistically cut from my AWS bill in one quarter?

On bills between $3K and $50K per month, I typically cut 30 to 55 percent inside one quarter without any downtime. The first 30 percent comes from low-risk hygiene: deleting unattached EBS volumes, right-sizing EC2, fixing CloudWatch retention, killing zombie NAT Gateway traffic. The next 20 percent is Savings Plans and Graviton migrations. Past 55 percent usually requires real architecture work and starts to risk regressions.

Are Savings Plans worth it for a startup that might pivot?

Compute Savings Plans at the 1-year no-upfront level give you 30 to 40 percent off list with almost no commitment risk — they apply to whatever instance family you end up on. The 3-year all-upfront tier saves 50 to 65 percent but locks cash and instance choices, so I rarely recommend it pre-Series A. Buy the 1-year compute SP on your steady-state workload and stay flexible on the rest.

Why is my NAT Gateway bill so high?

Almost always because every outbound call from your private subnet — including S3, DynamoDB, ECR, Secrets Manager, and CloudWatch — runs through NAT at $0.045 per GB. A single chatty service can rack up $1,500 per month in NAT processing fees alone. The fix is VPC Gateway Endpoints for S3 and DynamoDB (free) and Interface Endpoints for the rest ($0.01 per hour each, but they pay back fast).

Should I move to Kubernetes to save money?

Almost never, if cost is your only reason. EKS adds a $73-per-month control plane fee, a steep ops burden, and usually three times the engineering hours of ECS Fargate or Apps Runner. Kubernetes saves money at scale (dozens of services, hundreds of pods) when bin-packing matters. Below that, the operational tax outweighs the compute savings.

What is the single biggest waste line I see on audits?

It is a tie between unattached EBS volumes (gp3 at $0.08/GB-month adds up fast on a multi-year-old account) and CloudWatch Logs with infinite retention. I have refunded teams $400 to $2,000 a month from these two alone, in under an hour of work. They are the easiest wins on every AWS audit I run.

AWSCostDevOps

DevOps

DevOps for Startups on a $500/Month Budget

You do not need an AWS account, a Kubernetes cluster, or a DevOps engineer to run a serious SaaS in 2026. Here is the $200-400/month stack that takes you from launch to 50K users.

12 min readRead

Cost

SaaS Cost-Per-User: How to Calculate It and Get It Below $0.50

Cost-per-user is the single most diagnostic metric for SaaS unit economics, and most founders calculate it wrong. Here is the right formula, the right benchmarks, and the levers that actually move the number.

9 min readRead

Startup Cost

How Much Does It Cost to Build an MVP in 2026? (Real Numbers)

Founders ask 'how much for an MVP?' and most answers are dishonest. Here is the actual breakdown across five tiers in 2026, with what you really get and the hidden costs nobody quotes.

11 min readRead

Want a senior eye on your stack?

If you are scoping an MVP, scaling a SaaS, or staring at an inherited codebase, book a 30-minute call. No pitch deck required.

Book a strategy call See architecture audit