← All posts

Amazon Ads Bulk Operations: Designing for Partial Failure

Shipping containers stacked in rows

Bulk operations are where clean API abstractions go to suffer. Updating one campaign budget is simple. Updating ten thousand bids across hundreds of advertiser profiles is a different system. Some updates succeed, some fail validation, some hit rate limits, some time out, and the product still needs to tell the user exactly what happened.

The main design principle is to treat partial failure as the normal case. If the code assumes all-or-nothing success, the first real advertiser account will break the workflow.

Represent Work Explicitly

A bulk operation should become a durable job with child items. Each item has its own status, request payload, response payload, retry count, and error message. This makes the operation resumable and auditable.

type BulkItemStatus string

const (
    ItemPending   BulkItemStatus = "PENDING"
    ItemRunning   BulkItemStatus = "RUNNING"
    ItemSucceeded BulkItemStatus = "SUCCEEDED"
    ItemFailed    BulkItemStatus = "FAILED"
)

type BulkItem struct {
    ID        int64
    JobID     int64
    ProfileID int64
    EntityID  int64
    Payload   json.RawMessage
    Status    BulkItemStatus
    Error     string
}

Validate Before Calling the API

Every error you can catch locally saves API quota. Validate required fields, allowed ranges, entity ownership, and marketplace constraints before sending anything to Amazon. The API will still reject some items, but the obvious failures should never leave your system.

Group by Rate Limit Boundary

Amazon Ads rate limits often depend on profile, operation, and endpoint. A bulk worker should group items by the boundary that matters for throttling. That lets one large advertiser slow down without blocking unrelated profiles.

  • Group work by profile ID.
  • Use separate token buckets for different endpoint families.
  • Persist retry-after information when the API returns it.
  • Keep user-visible progress based on item status, not job status alone.

Return A Useful Result

A bulk operation is not done when the worker finishes. It is done when the user can understand the result. The response should include counts, downloadable error rows, and enough metadata to retry only failed items.

type BulkSummary struct {
    Total     int
    Succeeded int
    Failed    int
    Retriable int
    StartedAt time.Time
    FinishedAt *time.Time
}

Designing for partial failure makes the system feel more reliable because it is honest. Users do not need magic. They need clear progress, safe retries, and confidence that successful changes will not be rolled back because one row had a bad bid.

Comments