Posts

Showing posts from April, 2026

From Logs to Alerts: SLOs for Go APIs on AWS

Image
Logs are useful after something breaks. SLOs are useful before users start sending screenshots. The shift from log-based debugging to service-level objectives is one of the biggest maturity jumps a backend team can make. For Go APIs on AWS, I like starting with a small set of SLOs that match user pain: availability, latency, and freshness. Everything else can grow from there. Define What Good Means A service-level indicator is the measurement. A service-level objective is the target. For an API, the indicators are usually request success rate and latency. For a data pipeline, freshness matters too. 99.9% of API requests should return non-5xx responses over 30 days. 95% of dashboard requests should complete under 500ms. 95% of reporting data should be less than 15 minutes stale. Instrument at the Edge Measure user-visible behavior at the edge of the service. Handler middleware is a good place for request count, status, and duration. Do not build an SLO from internal function timi...

Cost-Aware LLM Routing: Reducing AI API Bills by 60%

Image
Three months after shipping the AI feature, our Anthropic API bill had grown faster than the revenue it was generating. The naive solution was to reduce usage. The right solution was to use the right model for each task. A cost-aware router that directs simple tasks to cheaper models and complex reasoning to powerful ones reduced our monthly AI spend by 60% while maintaining — and in some cases improving — output quality. The Insight: Not All Tasks Are Equal We were using Claude Opus for everything. Extracting a number from a JSON field does not need the same model as synthesising a 500-word campaign performance narrative. Classifying a keyword into one of five categories does not need the same model as generating a multi-step bid adjustment strategy. Using Opus for classification is like using a Ferrari to go grocery shopping. The Anthropic model family maps naturally to task complexity: Claude Haiku : fast, cheap (~50× cheaper than Opus per token), excellent for structured ex...