ECS Fargate Autoscaling: How We Cut Infrastructure Costs by 35%

When I joined the company the backend ran on a fleet of EC2 instances sized for peak traffic, sitting at 15% CPU utilisation most of the time. Scaling was manual — someone would notice latency going up, SSH into a box to check what was happening, then provision more capacity if needed. Deployments required coordination to drain the load balancer and restart services one by one. Migrating to ECS Fargate with autoscaling was the single biggest infrastructure improvement we made: costs dropped 35%, deployments became zero-downtime, and on-call became less stressful.

Why ECS Fargate Over EC2-Backed ECS

ECS can run on two launch types: EC2 (you manage the instances) and Fargate (AWS manages the compute). I chose Fargate for three reasons:

No instance management: no more AMI updates, no instance type selection, no patching
Bin packing is AWS's problem: with EC2-backed ECS you need the right EC2 instance size to fit your tasks efficiently. Fargate handles this transparently.
Per-task pricing: you pay for exactly the CPU and memory your tasks consume, per second

The Fargate premium (roughly 20–30% more expensive per vCPU than equivalent EC2 on-demand) is more than offset by eliminating over-provisioned capacity and operational overhead.

Right-Sizing Task Definitions

The most common Fargate mistake is provisioning too much CPU/memory because you are not sure what the task needs. Profile your service under production load first using CloudWatch Container Insights:

{
  "family": "api-service",
  "cpu": "512",
  "memory": "1024",
  "requiresCompatibilities": ["FARGATE"],
  "networkMode": "awsvpc",
  "executionRoleArn": "arn:aws:iam::123456789:role/ecsTaskExecutionRole",
  "taskRoleArn":      "arn:aws:iam::123456789:role/api-service-task-role",
  "containerDefinitions": [{
    "name": "api",
    "image": "123456789.dkr.ecr.eu-west-1.amazonaws.com/api:latest",
    "portMappings": [{"containerPort": 8080, "protocol": "tcp"}],
    "environment": [
      {"name": "ENV", "value": "production"}
    ],
    "secrets": [
      {"name": "DATABASE_URL", "valueFrom": "arn:aws:secretsmanager:eu-west-1:123456789:secret:db-url"}
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/api-service",
        "awslogs-region": "eu-west-1",
        "awslogs-stream-prefix": "ecs"
      }
    },
    "healthCheck": {
      "command": ["CMD-SHELL", "curl -sf http://localhost:8080/health || exit 1"],
      "interval": 10,
      "timeout": 5,
      "retries": 3,
      "startPeriod": 30
    }
  }]
}

The healthCheck field is essential — ECS uses it to determine when a task is healthy before routing traffic to it, and to replace tasks that become unhealthy. Without it, ECS considers a task healthy as soon as the container starts, which can cause traffic to reach the service before it is ready.

Service Configuration and Deployment

The ECS service definition controls rolling deployments. The maximumPercent and minimumHealthyPercent values determine how ECS replaces tasks during a deployment:

aws ecs create-service   --cluster production   --service-name api-service   --task-definition api-service:1   --desired-count 3   --launch-type FARGATE   --deployment-configuration '{
    "maximumPercent": 200,
    "minimumHealthyPercent": 100,
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }'   --network-configuration '{
    "awsvpcConfiguration": {
      "subnets": ["subnet-xxx", "subnet-yyy"],
      "securityGroups": ["sg-xxx"],
      "assignPublicIp": "DISABLED"
    }
  }'   --load-balancers '[{
    "targetGroupArn": "arn:aws:elasticloadbalancing:...",
    "containerName": "api",
    "containerPort": 8080
  }]'

The deploymentCircuitBreaker with rollback: true is essential for production: if a new deployment fails health checks, ECS automatically rolls back to the previous task definition. This saved us from extended outages on several occasions.

Target Tracking Autoscaling

Application Auto Scaling with target tracking is the simplest autoscaling configuration and covers most use cases. Target 60% CPU utilisation: high enough to keep costs down, low enough to absorb traffic spikes before new tasks start (Fargate task startup takes 30–60 seconds):

# Register the scalable target
aws application-autoscaling register-scalable-target   --service-namespace ecs   --resource-id service/production/api-service   --scalable-dimension ecs:service:DesiredCount   --min-capacity 2   --max-capacity 20

# Create the scaling policy
aws application-autoscaling put-scaling-policy   --policy-name api-cpu-target-tracking   --service-namespace ecs   --resource-id service/production/api-service   --scalable-dimension ecs:service:DesiredCount   --policy-type TargetTrackingScaling   --target-tracking-scaling-policy-configuration '{
    "TargetValue": 60.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 300
  }'

The asymmetric cooldowns matter: scale out fast (60s cooldown — get more capacity quickly during traffic spikes) but scale in slowly (300s cooldown — avoid terminating tasks only to need them again 5 minutes later).

Custom Metrics for SQS-Backed Workers

Target tracking on CPU does not work well for SQS consumers. If messages arrive faster than your consumer can process them, CPU might stay low while the queue depth grows. Use the queue depth metric instead:

aws application-autoscaling put-scaling-policy   --policy-type TargetTrackingScaling   --target-tracking-scaling-policy-configuration '{
    "TargetValue": 100,
    "CustomizedMetricSpecification": {
      "MetricName": "ApproximateNumberOfMessagesVisible",
      "Namespace": "AWS/SQS",
      "Dimensions": [{"Name": "QueueName", "Value": "my-queue"}],
      "Statistic": "Average"
    },
    "ScaleOutCooldown": 30,
    "ScaleInCooldown": 120
  }'

This targets 100 messages per worker task. With 1,000 messages in the queue, you will have 10 tasks processing them. Adjust the target based on your typical message processing time and desired end-to-end latency.

Fargate Spot for Non-Critical Workloads

Fargate Spot uses spare AWS capacity at up to 70% discount. The trade-off: Spot capacity can be reclaimed with a 2-minute notice. For stateless workers that checkpoint their progress, this is an excellent cost saving:

"capacityProviderStrategy": [
  {
    "capacityProvider": "FARGATE_SPOT",
    "weight": 4,
    "base": 0
  },
  {
    "capacityProvider": "FARGATE",
    "weight": 1,
    "base": 1  // always keep at least 1 on-demand task
  }
]

We run our report generation workers on 80% Spot / 20% On-Demand. The base of 1 on-demand task ensures the service never goes to zero capacity even during a Spot reclamation event. Workers handle the SIGTERM from Spot reclamation by completing the current batch and exiting cleanly.

Cost Breakdown

After the migration, our monthly compute bill went from €4,200 (3× m5.xlarge, on-demand) to €2,730 (ECS Fargate with Spot for workers). The 35% saving came from:

Eliminating idle capacity (EC2 provisioned for peak, Fargate scales to actual demand)
Fargate Spot for non-critical workers (70% discount on ~40% of compute)
Right-sized task definitions instead of instance-level provisioning

The operational improvements are harder to quantify but more impactful: zero-downtime deployments, automatic rollback on failures, and no more 2am alerts about instance health checks.

Search This Blog