Cloud Cost Optimisation: Cutting Your AWS and Azure Bills

Cloud Computing

Cloud computing promised to eliminate wasteful capital expenditure. In practice, many organisations have simply replaced unpredictable CAPEX with unpredictable—and often shocking—OPEX. Cloud bills grow 20–30% year-over-year for the average enterprise, and Gartner estimates that organisations waste 35% of their cloud spend on idle or over-provisioned resources. This guide provides a systematic approach to reclaiming that waste.

Understanding Cloud Cost Drivers

Before optimising, you need to understand what is driving your bill:

Compute (60–70% of typical cloud bills)

Virtual machine instances running at low utilisation
Instances left running 24/7 when they are only needed during business hours
Over-provisioned instance sizes chosen for "safety margin" and never reviewed

Storage (15–25%)

Old snapshots, backups, and AMIs never cleaned up
Data stored in high-performance tiers that does not require fast access
Unattached EBS volumes (AWS) or orphaned managed disks (Azure) from deleted VMs

Data Transfer / Egress (5–15%)

Data transferred out of the cloud to the internet or between regions
Poorly architected applications making unnecessary cross-region calls
Missing use of CDN for static content delivery

Managed Services

Databases, Kubernetes clusters, and load balancers running at low utilisation
Redundant services deployed in multiple regions without genuine HA need

The FinOps Framework

FinOps (Financial Operations) is the cultural and organisational practice of bringing financial accountability to cloud spending. It involves three teams working together:

Engineering: Build cost-efficient architectures; understand the cost impact of technical decisions
Finance: Forecast cloud costs accurately; budget for cloud like opex
Business: Prioritise investments; make trade-off decisions between cost and performance

FinOps Lifecycle

Inform: Full visibility into who is spending what on which resources
Optimise: Identify and eliminate waste; right-size resources
Operate: Continuous cost management embedded into engineering workflows

Cost Optimisation Tactics

1. Rightsizing Compute

Most organisations provision instances based on peak projected demand and never revisit them. In reality:

40–60% of cloud VMs run at < 20% CPU utilisation consistently
Memory is similarly over-provisioned

Action:

Review CPU and memory utilisation metrics over a 14–30 day period
Downsize instances where average utilisation is below 40% and peak is below 60%
Use AWS Compute Optimizer or Azure Advisor for automated rightsizing recommendations
Do not resize production without a test period in staging

2. Reserved Instances and Savings Plans

On-demand pricing is the most expensive way to run cloud infrastructure. For workloads with predictable usage, commit to reserved capacity:

| Commitment | Discount vs On-Demand |

|---|---|

| 1-year Reserved Instance (no upfront) | 30–40% |

| 1-year Reserved Instance (all upfront) | 40–50% |

| 3-year Reserved Instance (all upfront) | 50–65% |

| AWS Savings Plans (compute) | 20–66% |

| Azure Hybrid Benefit (Windows Server) | Up to 40% additional |

Strategy: Reserve your stable baseline capacity (running 24/7) and use on-demand or Spot for variable workloads.

3. Spot / Preemptible Instances

AWS Spot Instances and Azure Spot VMs use spare cloud capacity at 60–90% discount vs on-demand pricing. They can be interrupted with 2 minutes' notice, making them suitable for:

Batch processing and data analytics
CI/CD build agents
Stateless, fault-tolerant application tiers with auto-scaling
Development and test environments

Not suitable for: Stateful databases, synchronous customer-facing APIs, anything without graceful shutdown handling.

4. Auto-Scaling

Implement auto-scaling so you pay for compute only when you need it:

Scale out during peak demand; scale in (and stop billing) during off-peak periods
For dev/test environments: schedule automatic shutdown outside business hours (saves 65% on hours-based billing)
Use predictive auto-scaling (available on AWS and Azure) for workloads with predictable traffic patterns

Quick win: Identify all non-production environments and schedule automatic shutdown 18:00–08:00 weekdays and all weekend. For a 50-instance dev environment, this alone saves ~65% of compute cost.

5. Storage Optimisation

Snapshot and backup hygiene:

Implement lifecycle policies to automatically delete EBS snapshots / Azure disk snapshots older than 30 days (adjust to your retention policy)
Audit orphaned volumes (unattached disks) monthly and delete if no longer needed
Clean up old AMIs (AWS) and custom images (Azure) systematically

Storage tiering:

Move infrequently accessed data to cheaper tiers: AWS S3 Intelligent-Tiering, Azure Cool/Archive Blob Storage
Enable S3 Intelligent-Tiering for any bucket where access patterns are uncertain—it automatically moves objects between tiers based on access frequency with no retrieval penalty

Target storage costs: Data accessed daily → SSD/hot tier; accessed monthly → standard; accessed quarterly → cold; accessed rarely → archive (80–95% cheaper than SSD)

6. Tagging and Cost Allocation

Without tagging, cloud bills are a black box. Implement a mandatory tagging policy:

Required tags for every resource:

`Environment`: production / staging / development / test
`Owner`: team or individual responsible
`CostCentre`: department budget code
`Application`: application or project name
`Expiry`: for ephemeral resources (auto-delete after date)

Enforce tagging via AWS Service Control Policies or Azure Policy (resources without required tags cannot be created).

Use tag-based cost allocation reports to show each team their actual cloud spend and hold them accountable.

7. Eliminate Zombie Resources

Zombie resources are idle or abandoned cloud assets still generating charges:

Idle load balancers with no healthy targets
Elastic IPs (AWS) not associated with running instances (charged when idle)
Empty S3 buckets with replication enabled
Stopped VMs (still charged for storage and reserved IPs)
Unused NAT Gateways

Tool: AWS Cost Explorer, Azure Cost Management + Billing, CloudHealth, and Apptio Cloudability all provide idle resource reports.

8. Architect for Cost

Cost efficiency should be a first-class architectural requirement:

Serverless (AWS Lambda, Azure Functions): Pay only for execution time; no idle cost. Ideal for event-driven, intermittent workloads
Containers on managed Kubernetes (EKS, AKS): Higher density than VMs; bin-packing reduces per-workload cost
CDN for static content: CloudFront/Azure CDN is dramatically cheaper than serving static assets from compute instances
Regional architecture review: Data transfer between AWS regions is charged; unnecessary cross-region calls add up

Building a FinOps Practice

Immediate Actions (Week 1)

Enable Cost Explorer (AWS) or Cost Analysis (Azure) and review last 90 days of spend by service, region, and tag
Identify top 10 most expensive resources — investigate utilisation
Schedule auto-shutdown for all non-production environments

30-Day Actions

Complete rightsizing analysis; implement recommendations for 5+ largest instances
Purchase Reserved Instances for stable production workloads
Implement tagging policy and enforce via Policy-as-Code
Clean up orphaned volumes, old snapshots, and idle load balancers

90-Day Actions

Establish monthly FinOps review cadence with engineering and finance
Integrate cloud cost reporting into engineering team dashboards
Set budgets and anomaly detection alerts in cloud cost management tools
Develop storage lifecycle policies across all environments

Cloud cost optimisation is not a one-time project—it is a continuous discipline. Organisations that embed FinOps practices consistently reduce cloud spend by 20–35% without sacrificing performance or reliability.

#Cloud Cost#AWS#Azure#FinOps

Cloud Cost Optimisation: Cutting Your AWS and Azure Bills