Azure Cost Optimisation: Beyond the Obvious — Techniques That Actually Save Money

Architecture, Articles, Azure, CloudOps, Cost Optimization

Azure Cost Optimisation: Beyond the Obvious — Techniques That Actually Save Money

Reading time: 2 minutes, 57 seconds

Azure can get expensive fast. In this post I’ll share cost optimisation techniques from real workloads — starting with the well-known levers, then the non-obvious ones most teams miss. Those second ones are where the real money is.

Azure cost optimisation key levers — Figure 1 — The four main levers of Azure cost optimisation

The Standard Levers

1. Right-Size VMs with Azure Advisor

Azure Advisor → Cost analyses CPU/memory over 7–14 days. Typical savings: 30–50%. 15 minutes to identify, 30 to fix.

			
az monitor metrics list \
  --resource /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Compute/virtualMachines/{vm} \
  --metric "Percentage CPU" --interval PT1H \
  --query "value[0].timeseries[0].data[*].average"

2. Reserved Instances

1-year: ~40% off. 3-year: ~60–65%. If a VM runs 24/7 in production, commit. Buy in the smallest size in a series — Azure’s flexibility applies the reservation upward automatically.

3. Spot VMs

Up to 90% discount. 30-second eviction notice. Use for: batch, CI/CD agents, ML training, dev/test. Never for production services.

4. Auto-Shutdown Dev VMs

			
az vm auto-shutdown --resource-group rg-dev --name vm-dev-01 --time 1900 --email you@example.com

5. Azure Hybrid Benefit

Windows Server or SQL Server SA licences? Up to 40% off Windows VMs, 55% off SQL. A single checkbox many teams forget.

6. Blob Lifecycle Policies

Cool tier (~50% cheaper) after 30 days, Archive (~90% cheaper) after 180 days. Set once, runs forever.

7. Reserved Database Capacity

Azure SQL, Cosmos DB, PostgreSQL: 1yr (~35%) and 3yr (~55%) reserved capacity. Same concept as VM reservations.

8. Cost Budgets and Alerts

Cost Management → Budgets → Add. Alert at 80% and 100%. Doesn’t reduce spend but gives early warning.

9. Enforce Resource Tagging

Azure Policy requiring Environment, Owner, CostCentre. Without attribution, Cost Management is noise.

🔴 Log Analytics Workspace — The Silent Budget Killer

This is the one I’ve seen cause the biggest bill shocks. Log Analytics charges ~$2.30/GB for ingestion on pay-as-you-go. Sounds manageable — until a developer enables verbose diagnostic logs on a busy service, retention gets set to 2 years across every table, and nobody notices for three months. I’ve seen LAW bills go from $200/month to over $8,000/month after a “temporary” debug session that never got turned off.

Find what you’re actually ingesting

			
// Top data sources by ingestion volume (last 30 days)
Usage
| where TimeGenerated > ago(30d)
| summarize TotalGB = sum(Quantity) / 1000 by DataType
| order by TotalGB desc
| take 20

		

AppTraces, AzureDiagnostics, ContainerLog are the usual offenders. Then find tables nobody queries:

			
// High ingestion, zero queries in 30 days — paying for nothing
Usage
| where TimeGenerated > ago(30d)
| summarize IngestedGB = sum(Quantity)/1000 by DataType
| join kind=leftouter (
    search * | where TimeGenerated > ago(30d)
    | summarize Queries = count() by $table
) on $left.DataType == $right.$table
| where isempty(Queries)
| order by IngestedGB desc

		

Per-table retention — not one global setting

AppTraces doesn’t need the same retention as SecurityEvent. Set them independently — this alone cuts storage 40–60% on workspaces where a compliance retention was applied globally:

			
# AppTraces: 30 days
az monitor log-analytics workspace table update \
  --resource-group rg-monitoring --workspace-name law-prod \
  --name AppTraces --retention-time 30
# SecurityEvent: 2 years for compliance
az monitor log-analytics workspace table update \
  --resource-group rg-monitoring --workspace-name law-prod \
  --name SecurityEvent --retention-time 730

		

Data Collection Rules — filter before it arrives

DCRs transform and filter log data before it reaches the workspace. You never pay for data you don’t store. Drop debug-level traces entirely:

transformKql: "source | where SeverityLevel != 'Debug'"

Application Insights adaptive sampling — almost nobody enables this

Default: 100% of all telemetry. A busy API generates 15–50GB/day in App Insights alone. Adaptive sampling reduces this while preserving accuracy for P95 latency and error rates:

			
// appsettings.json (.NET)
{
  "ApplicationInsights": {
    "EnableAdaptiveSampling": true,
    "MaxTelemetryItemsPerSecond": 5
  }
}

		

💡 Real impact: 15GB/day → under 2GB/day on a production API, zero loss of diagnostic value.

LAW ingestion commitment tiers

Like Reserved Instances for compute — commit to an ingestion tier (100GB/day etc.) for up to 30% discount vs pay-as-you-go. Do this after you’ve cleaned up your volume.

🤖 AI Workload Costs — Model Routing Saves 60–80%

Most teams send every LLM request through the most expensive model. 60–80% of enterprise AI requests are simple enough for a mini model:

Request type	Right model	Cost per 1M tokens
FAQ, classification, simple retrieval	GPT-4o-mini / Phi-3	~$0.15
Summarisation, moderate reasoning	GPT-4o (cached)	~$1.25
Complex analysis, long context	GPT-4o / o1	~$5–15

🔗 AI Model Router: How to Cut Your LLM Bill by 60–80% Without Sacrificing Quality →

🔍 Hidden Costs Most Teams Ignore

Orphaned resources at scale

Organisations with 100+ VMs churned over 2 years can have 20–40 unattached managed disks billing silently. Premium SSD P30 (1TB) = $135/month each:

			
az graph query -q "
Resources
| where type == 'microsoft.compute/disks'
| where properties.diskState == 'Unattached'
| project name, resourceGroup, sku = tostring(properties.sku.name), sizeGB = tostring(properties.diskSizeGB)
| order by sizeGB desc" --subscriptions {sub-id}

		

Also: unused public IPs ($3.65/month each), empty App Service Plans, unused Load Balancers, old snapshots.

Azure Firewall per-subscription sprawl

~$900/month per Firewall instance. 15–20 subscriptions with independent firewalls = $13,000–18,000/month in firewall costs alone. Fix: hub-spoke with a single Firewall + Policy inheritance. Same coverage, one instance.

AKS user node pools on spot

System pools can’t use spot. User pools — where your application workloads run — absolutely can:

			
az aks nodepool add \
  --cluster-name aks-prod --resource-group rg-aks \
  --name spotnodes --priority Spot \
  --eviction-policy Delete --spot-max-price -1 \
  --node-count 3 --node-vm-size Standard_D4s_v5

		

Cost Management anomaly detection — built in, almost nobody uses it

Cost Management → Cost alerts → Anomaly alerts → Enable. Add an action group. You get an email the day a spike starts — not at month end when the damage is done.

NAT Gateway vs per-VM public IPs

20 VMs with public IPs = $73/month just in IPs. One NAT Gateway (~$32/month) handles all outbound traffic, is more secure, and is simpler to manage.

Where to Start

Run the LAW ingestion KQL now. If you find a table over 5GB/month with zero queries — that’s immediate recoverable money.
Open Azure Advisor → Cost. Tells you exactly which VMs to resize.
Enable Cost Management anomaly alerts. Two minutes. Already built into your subscription.

Cost optimisation is ongoing, not a project. Monthly review: Advisor, LAW ingestion audit, anomaly alerts, reserved instance coverage.

What’s your biggest Azure cost surprise? Drop a comment below.

Andrey Krasikov

Senior Cloud Architect with 25+ years in IT and 10+ years designing enterprise Azure and AWS solutions. Microsoft Azure Solutions Architect Expert. Specialising in cloud-native architectures, Infrastructure as Code (Terraform, Bicep), DevOps pipelines, data platforms, and AI-powered workloads. Helped 100+ organisations migrate, modernise, and optimise their cloud environments. Based in the USA — connect on LinkedIn or explore my open-source work on GitHub.

Latest Posts

Refactoring Terraform state with moved, removed and import blocks

Refactoring Terraform State Without Destroying Everything

June 15, 2026

Self-Service Infrastructure, Driven by AI: An IaC MCP Server for Azure

June 9, 2026

Why Terraform Workspaces Are the Wrong Tool for Environments

June 8, 2026

Taming Terraform perpetual diffs with ignore_changes

Taming the Permadiff: When Terraform Wants to Change Something That Isn’t Really Changing

June 8, 2026

Azure Cost Optimisation: Beyond the Obvious — Techniques That Actually Save Money

June 6, 2026

AI Model Router: How to Cut Your LLM Bill by 60-80% Without Sacrificing Quality

June 6, 2026

Refactoring Terraform State Without Destroying Everything

June 15, 2026 No Comments

Self-Service Infrastructure, Driven by AI: An IaC MCP Server for Azure

June 9, 2026 No Comments

Why Terraform Workspaces Are the Wrong Tool for Environments

June 8, 2026 No Comments

Andrey Krasikov

Architecture, Articles, Azure, CloudOps, Cost Optimization

Azure Cost Optimisation: Beyond the Obvious — Techniques That Actually Save Money

The Standard Levers

1. Right-Size VMs with Azure Advisor

2. Reserved Instances

3. Spot VMs

4. Auto-Shutdown Dev VMs

5. Azure Hybrid Benefit

6. Blob Lifecycle Policies

7. Reserved Database Capacity

8. Cost Budgets and Alerts

9. Enforce Resource Tagging

🔴 Log Analytics Workspace — The Silent Budget Killer

Find what you’re actually ingesting

Per-table retention — not one global setting

Data Collection Rules — filter before it arrives

Application Insights adaptive sampling — almost nobody enables this

LAW ingestion commitment tiers

🤖 AI Workload Costs — Model Routing Saves 60–80%

🔍 Hidden Costs Most Teams Ignore

Orphaned resources at scale

Azure Firewall per-subscription sprawl

AKS user node pools on spot

Cost Management anomaly detection — built in, almost nobody uses it

NAT Gateway vs per-VM public IPs

Where to Start

Like this:

Related

Leave a ReplyCancel reply

Latest Posts

Get new articles by email

Related Article

Architecture, Articles, Azure, CloudOps, Cost Optimization

Azure Cost Optimisation: Beyond the Obvious — Techniques That Actually Save Money

The Standard Levers

1. Right-Size VMs with Azure Advisor

2. Reserved Instances

3. Spot VMs

4. Auto-Shutdown Dev VMs

5. Azure Hybrid Benefit

6. Blob Lifecycle Policies

7. Reserved Database Capacity

8. Cost Budgets and Alerts

9. Enforce Resource Tagging

🔴 Log Analytics Workspace — The Silent Budget Killer

Find what you’re actually ingesting

Per-table retention — not one global setting

Data Collection Rules — filter before it arrives

Application Insights adaptive sampling — almost nobody enables this

LAW ingestion commitment tiers

🤖 AI Workload Costs — Model Routing Saves 60–80%

🔍 Hidden Costs Most Teams Ignore

Orphaned resources at scale

Azure Firewall per-subscription sprawl

AKS user node pools on spot

Cost Management anomaly detection — built in, almost nobody uses it

NAT Gateway vs per-VM public IPs

Where to Start

Share this:

Like this:

Related

Leave a ReplyCancel reply

Latest Posts

Get new articles by email

Related Article

Discover more from HandsOnAzure