ECS Worker Autoscaling with Queue Depth and Lag Metrics

Data center infrastructure for autoscaling systems

CPU-based autoscaling works well for web services. It works poorly for queue workers. A worker can be at 20% CPU and still be dangerously behind because the queue is receiving messages faster than it can process them. For SQS workers on ECS, the better scaling signal is backlog per task and message age.

The Metric That Matters

The metric I start with is backlog per running task. If there are 20,000 visible messages and 20 ECS tasks, each task effectively owns 1,000 messages. If the processing rate is known, that number can be translated into expected drain time.

backlog_per_task = visible_messages / max(running_tasks, 1)

For workloads with variable processing time, combine it with approximate age of oldest message. Queue depth tells you how much work exists. Age tells you whether users are waiting too long.

Scaling Policy Shape

A simple target tracking policy can work, but I prefer step scaling for important worker pools because it lets you react aggressively when lag is high and scale down slowly when the queue is healthy.

Scale out when backlog per task is above the target for several minutes.
Scale out faster if oldest message age crosses a user-visible threshold.
Scale in slowly to avoid oscillation.
Keep a small minimum capacity for warm connections and predictable latency.

Worker Concurrency

Scaling tasks is only half of the story. Each task may process messages concurrently. The safe concurrency depends on downstream limits: database connections, API rate limits, and memory pressure.

type WorkerConfig struct {
    MaxMessagesInFlight int
    VisibilityTimeout   time.Duration
    ShutdownTimeout     time.Duration
}

I like keeping concurrency explicit in config and exporting it as a metric. During incidents, it should be obvious whether the bottleneck is the number of tasks, concurrency inside each task, or the downstream dependency.

Avoiding Scale-In Damage

ECS will stop tasks during scale-in. If the worker does not handle SIGTERM correctly, messages can be abandoned mid-processing and return to the queue. That is usually fine if handlers are idempotent, but it still creates noise. Graceful shutdown should stop polling, wait for in-flight messages, and extend visibility timeout when needed.

The result is an autoscaling system that follows work, not CPU. For background processing, that difference is the difference between a quiet dashboard and a morning spent explaining stale data.

Search This Blog