Posts

Showing posts from May, 2026

Designing Backfill Jobs That Do Not Take Production Down

Image
Backfills are deceptively dangerous. The code is often simple: read old rows, compute a missing value, write it back. The danger is scale. A job that behaves perfectly on ten thousand rows can overload a database, fill a queue, or starve production traffic when it runs across hundreds of millions of records. A production-safe backfill is designed like a service: observable, resumable, throttled, and boring to stop. Make Progress Durable Do not rely on an in-memory cursor for long backfills. Store progress in a table so the job can resume after deploys, crashes, or manual pauses. type BackfillCheckpoint struct { JobName string LastID int64 UpdatedAt time.Time } func (b *Backfill) Run(ctx context.Context) error { checkpoint, err := b.store.LoadCheckpoint(ctx, "campaign-currency") if err != nil { return err } return b.processFrom(ctx, checkpoint.LastID) } Chunk Everything Large transactions are the enemy. Process small batches, co...

Amazon Ads Bulk Operations: Designing for Partial Failure

Image
Bulk operations are where clean API abstractions go to suffer. Updating one campaign budget is simple. Updating ten thousand bids across hundreds of advertiser profiles is a different system. Some updates succeed, some fail validation, some hit rate limits, some time out, and the product still needs to tell the user exactly what happened. The main design principle is to treat partial failure as the normal case. If the code assumes all-or-nothing success, the first real advertiser account will break the workflow. Represent Work Explicitly A bulk operation should become a durable job with child items. Each item has its own status, request payload, response payload, retry count, and error message. This makes the operation resumable and auditable. type BulkItemStatus string const ( ItemPending BulkItemStatus = "PENDING" ItemRunning BulkItemStatus = "RUNNING" ItemSucceeded BulkItemStatus = "SUCCEEDED" ItemFailed BulkItemStatus = ...

Amazon Ads API at Scale: Rate Limiting, Pagination and Bulk Operations in Go

Image
After three years of building and maintaining the platform — a platform that manages Amazon advertising campaigns for thousands of advertisers — I have made every mistake possible with the Amazon Ads API. This post is a practical guide to operating the API at scale: how to stay within rate limits across thousands of advertiser profiles, how to paginate correctly, and how to bulk-process operations without hammering the API into returning 429s. The Scale Problem When you have one advertiser, the Amazon Ads API is straightforward. When you have 2,000 advertisers, each with dozens of campaigns, hundreds of ad groups, and thousands of keywords, the same operations become an engineering challenge. A nightly sync that takes 3 seconds per advertiser profile takes over an hour across the fleet. Any operation that requires multiple API calls per entity — reading, computing, then writing — multiplies that cost. The constraints you need to design around: Rate limits are per profile (per ...