Cloud Cost Optimization: $840K Annual Savings

SaaS / Technology • 2 months initial optimization, ongoing refinement

Industry

SaaS / Technology

Timeline

2 months initial optimization, ongoing refinement

Results

5 Key Wins

Results Achieved

Annual savings: $840,000 (38% reduction)
Per-customer cost reduced 45%
Performance improved despite cost cuts
Established FinOps practice with monthly cost reviews
Forecasting accuracy improved from ±40% to ±5%

Technology Stack

AWS Cost Explorer CloudWatch Terraform Kubernetes S3 RDS EC2

The Problem

A B2B SaaS startup had grown from $500K to $15M ARR in 18 months. Great news—except their AWS bill grew even faster:

Month	Revenue	AWS Cost	% of Revenue
Jan	$1.0M	$45K	4.5%
Jul	$1.2M	$75K	6.3%
Dec	$1.4M	$115K	8.2%

At this trajectory, cloud costs would hit 12% of revenue within a year—unsustainable for a SaaS business (target: 3-5%).

Worse, nobody knew why costs were increasing. The team was “too busy shipping features” to investigate.

The board mandated: Cut cloud costs by 30% within 90 days.

Discovery: Where’s the Money Going?

We conducted a week-long audit using AWS Cost Explorer, CloudWatch, and custom analysis scripts.

Finding #1: Over-Provisioned Instances

The Issue:

120 EC2 instances, mostly m5.4xlarge (16 vCPU, 64GB RAM)
Average CPU utilization: 12%
Average memory utilization: 28%

Why it happened: Engineers provisioned for peak load (Black Friday), then never scaled down.

The fix: Rightsized 80% of instances to m5.xlarge or m5.2xlarge.

# Analysis script to find underutilized instances
aws ec2 describe-instances --query 'Reservations[*].Instances[*].[InstanceId,InstanceType]' --output text | while read instance type; do
  cpu_avg=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/EC2 \
    --metric-name CPUUtilization \
    --dimensions Name=InstanceId,Value=$instance \
    --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 3600 \
    --statistics Average \
    --query 'Datapoints[*].Average' \
    --output text | awk '{sum+=$1; count++} END {print sum/count}')
  
  if (( $(echo "$cpu_avg < 20" | bc -l) )); then
    echo "Instance $instance ($type) - CPU: $cpu_avg% - RIGHTSIZE CANDIDATE"
  fi
done

Savings: $18K/month

Finding #2: No Reserved Instances or Savings Plans

The Issue: All compute running on-demand. With stable, predictable workloads.

The fix:

Purchased 1-year Compute Savings Plans (covers EC2, Fargate, Lambda)
Committed to 70% of baseline usage
Reserved 30% for on-demand scaling

Savings: $32K/month (40% discount on committed usage)

Finding #3: Massive S3 Costs

The Issue:

S3 bill: $28K/month
480TB of data, 95% never accessed after 30 days
Everything in S3 Standard storage class

The fix: Implemented lifecycle policies:

{
  "Rules": [{
    "Id": "MoveToInfrequentAccess",
    "Status": "Enabled",
    "Filter": {
      "Prefix": "uploads/"
    },
    "Transitions": [
      {
        "Days": 30,
        "StorageClass": "STANDARD_IA"
      },
      {
        "Days": 90,
        "StorageClass": "GLACIER_IR"
      },
      {
        "Days": 365,
        "StorageClass": "DEEP_ARCHIVE"
      }
    ]
  }]
}

Savings: $19K/month (68% reduction in S3 storage costs)

Finding #4: RDS Over-Provisioning

The Issue:

Production database: db.r5.8xlarge (32 vCPU, 256GB RAM)
Actual usage: 15% CPU, 40% memory
Multi-AZ (good!), but oversized

The fix:

Downsized to db.r5.2xlarge (still multi-AZ)
Purchased Reserved Instance (1-year, all upfront)
Set up CloudWatch alarms to monitor performance

Savings: $11K/month

Performance impact: None. P95 query latency actually improved (better cache hit ratios with right-sized instance).

Finding #5: Data Transfer Costs

The Issue:

$8K/month in data transfer
API responses averaging 2MB (mostly redundant data)
Images served from S3 without CloudFront

The fix:

Optimized API responses (removed unnecessary fields)
Implemented CloudFront CDN for static assets
Enabled gzip compression

Savings: $5K/month

Finding #6: Zombie Resources

The Issue:

34 EBS volumes not attached to instances ($2.1K/month)
18 Elastic IPs not associated with instances ($4.3K/month)
12 load balancers with zero traffic ($2.9K/month)
Snapshots retained indefinitely (8TB, $1.6K/month)

The fix:

# Delete unattached EBS volumes
aws ec2 describe-volumes --filters Name=status,Values=available \
  --query 'Volumes[*].VolumeId' --output text | \
  xargs -I {} aws ec2 delete-volume --volume-id {}

# Release unassociated Elastic IPs
aws ec2 describe-addresses --filters Name=instance-id,Values="" \
  --query 'Addresses[*].AllocationId' --output text | \
  xargs -I {} aws ec2 release-address --allocation-id {}

# Delete idle load balancers
# (manual review first to avoid breaking things)

# Implement snapshot retention policy (keep 7 daily, 4 weekly, 12 monthly)

Savings: $11K/month

Finding #7: Development/Staging Running 24/7

The Issue:

Dev and staging environments identical to production
Running 24/7, even though devs only worked 9-5 weekdays
Cost: $22K/month

The fix: Auto-shutdown non-prod environments:

# Lambda function to stop dev/staging at 7 PM weekdays
import boto3
from datetime import datetime

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    rds = boto3.client('rds')
    
    # Stop EC2 instances tagged Environment=dev or Environment=staging
    ec2.stop_instances(
        InstanceIds=get_non_prod_instances(ec2)
    )
    
    # Stop RDS instances
    for db in get_non_prod_databases(rds):
        rds.stop_db_instance(DBInstanceIdentifier=db)
    
    return {'statusCode': 200, 'body': 'Non-prod environments stopped'}

Developers start environments when needed via Slack bot.

Savings: $14K/month (64% reduction in non-prod costs)

The Results

Cost Breakdown

Category	Before	After	Monthly Savings
EC2 Compute	$42K	$24K	$18K
Savings Plans	—	—	$32K
S3 Storage	$28K	$9K	$19K
RDS	$18K	$7K	$11K
Data Transfer	$8K	$3K	$5K
Zombie Resources	$11K	—	$11K
Dev/Staging	$22K	$8K	$14K
Total	$129K	$59K	$70K/month

Annual savings: $840,000

Business Impact

Gross margin improved from 68% to 75%
Runway extended by 4 months without additional fundraising
Per-customer cost dropped 45% (better unit economics)
Performance improved: Lower latency (CloudFront), faster queries (rightsized RDS)

Process Improvements

Implemented FinOps practice:

Monthly cost review meeting (30 minutes)
Cost attribution by team/product (tagging strategy)
Budget alerts (notify when exceeding forecast by 10%)
Quarterly rightsizing reviews

Cost forecasting:

Before: ±40% accuracy (basically guessing)
After: ±5% accuracy (reliable planning)

Developer awareness: Added cost visibility to CI/CD:

# GitHub Action: Estimate infrastructure cost changes
name: Cost Estimation
on: [pull_request]

jobs:
  cost:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Infracost
        uses: infracost/actions/comment@v1
        with:
          path: terraform/

Pull requests now show: “This change will increase monthly cost by $127.”

Lessons Learned

1. Start with low-hanging fruit
Zombie resources and lifecycle policies are easy wins. We achieved 20% savings in the first week.

2. Measure everything
Can’t optimize what you don’t measure. Tag resources, track utilization, monitor trends.

3. Automate cost controls
Auto-shutdown dev environments, lifecycle policies, budget alerts. Don’t rely on manual discipline.

4. Rightsizing > Spot Instances (for most workloads)
Spot instances save more but add complexity. Rightsizing is simpler and safer for production.

5. Reserved Instances are free money
If you have predictable workloads, not buying RIs/Savings Plans is leaving money on the table.

6. Cost optimization is ongoing
It’s not a one-time project. Resources drift, usage patterns change. Review quarterly.

Common Objections (and Responses)

“Optimization takes engineering time away from features.”
True. But so do outages from over-complicated infrastructure. Plus, lower costs = longer runway = more time to build.

“We’ll optimize once we’re bigger.”
Cost inefficiencies compound. A startup wasting 30% at $100K/month wastes $360K/year. At $1M/month? $3.6M/year.

“Cloud costs are just the price of doing business.”
No. Many successful SaaS companies run at 3-5% of revenue. If you’re at 10%+, you have problems.

“Our architecture is already optimized.”
Every team we’ve worked with thought this. Every time we found 20-40% savings.

Quick Wins You Can Implement Today

Find zombie resources: Unattached EBS volumes, idle load balancers, unused Elastic IPs
Enable S3 Intelligent-Tiering: Automatic cost optimization with zero config
Set up budget alerts: Get notified when spending exceeds forecast
Review EC2 instance types: Check CloudWatch CPU/memory utilization
Auto-shutdown dev/staging: Save 60% on non-prod environments

Each of these takes <1 hour to implement.

The Bottom Line

Cloud costs don’t have to spiral out of control. With systematic analysis and automation, most companies can cut 20-40% without impacting performance.

This client went from a cost crisis to best-in-class unit economics in 90 days. Their cloud costs now scale linearly with revenue—exactly what a healthy SaaS business should see.

Have a Similar Problem?

Let's talk. We'll figure out if we can help and give you a clear plan.

Book a Free Call