← All posts

Amazon Ads API at Scale: Rate Limiting, Pagination and Bulk Operations in Go

After three years of building and maintaining the platform — a platform that manages Amazon advertising campaigns for thousands of advertisers — I have made every mistake possible with the Amazon Ads API. This post is a practical guide to operating the API at scale: how to stay within rate limits across thousands of advertiser profiles, how to paginate correctly, and how to bulk-process operations without hammering the API into returning 429s.

The Scale Problem

When you have one advertiser, the Amazon Ads API is straightforward. When you have 2,000 advertisers, each with dozens of campaigns, hundreds of ad groups, and thousands of keywords, the same operations become an engineering challenge. A nightly sync that takes 3 seconds per advertiser profile takes over an hour across the fleet. Any operation that requires multiple API calls per entity — reading, computing, then writing — multiplies that cost.

The constraints you need to design around:

  • Rate limits are per profile (per advertiser), not per API key
  • Most write endpoints allow 2–5 requests per second per profile
  • Batch endpoints accept 1,000 entities per request on most endpoints
  • The reporting API has separate, stricter limits
  • Tokens expire after 1 hour and must be refreshed

Token Management at Scale

With 2,000 advertisers you likely have a different refresh token per advertiser (each granted access via Login with Amazon). Refreshing all tokens on startup is impractical. Instead, use a lazy-loading token cache:

type TokenCache struct {
    mu     sync.RWMutex
    tokens map[string]*tokenEntry // profileID → entry
    client *http.Client
}

type tokenEntry struct {
    accessToken  string
    expiresAt    time.Time
    refreshToken string
}

func (c *TokenCache) Get(ctx context.Context, profileID string) (string, error) {
    c.mu.RLock()
    entry, ok := c.tokens[profileID]
    c.mu.RUnlock()

    if ok && time.Until(entry.expiresAt) > 60*time.Second {
        return entry.accessToken, nil // still valid
    }

    // Refresh
    c.mu.Lock()
    defer c.mu.Unlock()

    // Double-check after acquiring write lock
    if entry, ok := c.tokens[profileID]; ok && time.Until(entry.expiresAt) > 60*time.Second {
        return entry.accessToken, nil
    }

    newToken, expiresIn, err := c.refresh(ctx, profileID)
    if err != nil {
        return "", fmt.Errorf("token refresh for profile %s: %w", profileID, err)
    }

    c.tokens[profileID] = &tokenEntry{
        accessToken: newToken,
        expiresAt:   time.Now().Add(time.Duration(expiresIn) * time.Second),
    }
    return newToken, nil
}

The key detail: refresh happens lazily, only when the token is about to expire, and the double-checked locking pattern prevents stampede when multiple goroutines try to refresh the same profile's token simultaneously.

Per-Profile Rate Limiting

Naively processing advertisers in parallel will cause 429 errors on every profile that gets more than ~5 requests per second. You need a rate limiter per profile, not a global one:

type ProfileLimiter struct {
    mu       sync.Mutex
    limiters map[string]*rate.Limiter
}

func (pl *ProfileLimiter) get(profileID string) *rate.Limiter {
    pl.mu.Lock()
    defer pl.mu.Unlock()
    if l, ok := pl.limiters[profileID]; ok {
        return l
    }
    // 3 requests/second, burst of 5
    l := rate.NewLimiter(rate.Limit(3), 5)
    pl.limiters[profileID] = l
    return l
}

func (pl *ProfileLimiter) Wait(ctx context.Context, profileID string) error {
    return pl.get(profileID).Wait(ctx)
}

// Usage in API client:
func (c *AdsClient) Do(ctx context.Context, profileID string, req *http.Request) (*http.Response, error) {
    if err := c.limiter.Wait(ctx, profileID); err != nil {
        return nil, err
    }
    req.Header.Set("Amazon-Advertising-API-Scope", profileID)
    req.Header.Set("Authorization", "Bearer "+c.tokens.Must(ctx, profileID))
    return c.http.Do(req)
}

Pagination: Never Assume One Page

The Ads API uses cursor-based pagination via startIndex and count parameters. An advertiser with 10,000 keywords will require 100 API calls to read them all at page size 100. Always paginate:

func (c *AdsClient) ListAllCampaigns(ctx context.Context, profileID string) ([]Campaign, error) {
    var all []Campaign
    startIndex := 0
    const pageSize = 100

    for {
        var page []Campaign
        url := fmt.Sprintf(
            "%s/v2/sp/campaigns?startIndex=%d&count=%d&stateFilter=enabled,paused",
            c.baseURL, startIndex, pageSize,
        )

        if err := c.getJSON(ctx, profileID, url, &page); err != nil {
            return nil, fmt.Errorf("page at index %d: %w", startIndex, err)
        }

        all = append(all, page...)

        if len(page) < pageSize {
            break // last page
        }
        startIndex += pageSize
    }

    return all, nil
}

The subtle bug here: using len(page) == 0 as the stop condition misses cases where the last page happens to be exactly pageSize items. Always stop when you receive fewer items than you requested.

Batch Writes: Maximise Throughput

The Ads API accepts arrays for most write operations (create/update campaigns, ad groups, keywords). Using single-entity writes when you need to update 500 keywords is an order of magnitude slower than batching:

func (c *AdsClient) UpdateKeywordBids(ctx context.Context, profileID string, bids []BidUpdate) error {
    const batchSize = 1000 // API maximum

    for i := 0; i < len(bids); i += batchSize {
        end := i + batchSize
        if end > len(bids) {
            end = len(bids)
        }
        batch := bids[i:end]

        body, _ := json.Marshal(batch)
        req, _ := http.NewRequestWithContext(ctx, http.MethodPut,
            c.baseURL+"/v2/sp/keywords", bytes.NewReader(body))

        resp, err := c.Do(ctx, profileID, req)
        if err != nil {
            return fmt.Errorf("batch %d: %w", i/batchSize, err)
        }

        // Parse partial success — the API returns per-entity errors
        var results []struct {
            KeywordID int64  `json:"keywordId"`
            Code      string `json:"code"`
        }
        json.NewDecoder(resp.Body).Decode(&results)
        resp.Body.Close()

        for _, r := range results {
            if r.Code != "SUCCESS" {
                log.Warnf("keyword %d: %s", r.KeywordID, r.Code)
            }
        }
    }
    return nil
}

Critical detail: the Ads API returns partial success for batch writes. A batch of 1,000 might have 998 successes and 2 failures. Always parse the per-entity response codes, not just the HTTP status code.

Exponential Backoff for 429s

Despite the per-profile rate limiter, you will occasionally hit 429s — especially during Amazon's peak processing times. A well-tuned backoff avoids hammering an already-overloaded endpoint:

func withRetry(ctx context.Context, fn func() (*http.Response, error)) (*http.Response, error) {
    backoff := 1 * time.Second
    for attempt := 0; attempt < 5; attempt++ {
        resp, err := fn()
        if err != nil {
            return nil, err
        }
        if resp.StatusCode != http.StatusTooManyRequests {
            return resp, nil
        }
        resp.Body.Close()

        // Respect Retry-After header if present
        if ra := resp.Header.Get("Retry-After"); ra != "" {
            if secs, err := strconv.Atoi(ra); err == nil {
                backoff = time.Duration(secs) * time.Second
            }
        }

        select {
        case <-ctx.Done():
            return nil, ctx.Err()
        case <-time.After(backoff):
        }
        backoff = min(backoff*2, 60*time.Second) // cap at 60s
    }
    return nil, ErrMaxRetriesExceeded
}

Processing the Fleet: Worker Pool Pattern

To process all 2,000 advertisers efficiently without blowing the per-profile rate limits, use a bounded worker pool. Each worker handles one advertiser at a time; the per-profile limiter throttles concurrent requests for the same advertiser:

func (s *SyncService) SyncAll(ctx context.Context, profileIDs []string) error {
    jobs := make(chan string, len(profileIDs))
    for _, id := range profileIDs {
        jobs <- id
    }
    close(jobs)

    var wg sync.WaitGroup
    errCh := make(chan error, len(profileIDs))

    // 50 concurrent workers — each throttled per-profile by the rate limiter
    for i := 0; i < 50; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for profileID := range jobs {
                if err := s.syncProfile(ctx, profileID); err != nil {
                    errCh <- fmt.Errorf("profile %s: %w", profileID, err)
                }
            }
        }()
    }

    wg.Wait()
    close(errCh)

    var errs []error
    for err := range errCh {
        errs = append(errs, err)
    }
    return errors.Join(errs...)
}

Observability: What to Instrument

Running at scale means you need metrics. The three most useful things to track for the Ads API:

  • API call duration per endpoint — know which endpoints are slow so you can batch or cache them
  • 429 rate per profile — profiles with high 429 rates indicate your per-profile limiter is tuned too aggressively or the advertiser has unusually strict limits
  • Batch partial failure rate — batch writes with >1% failures often indicate data quality issues (invalid bids, archived entities)
func (c *AdsClient) recordMetrics(profileID, endpoint string, duration time.Duration, statusCode int) {
    metrics.RecordAPICall(endpoint, strconv.Itoa(statusCode), duration)
    if statusCode == 429 {
        metrics.IncrCounter("amazon_ads.rate_limited", map[string]string{"profile": profileID})
    }
}

Summary

The Amazon Ads API is reliable and capable, but it rewards careful engineering. The patterns that matter at scale: lazy token caching with double-checked locking, per-profile rate limiting, strict pagination without assuming you have all data after one page, maximum batch sizes with partial-failure handling, and instrumentation on the metrics that actually predict problems before they become incidents.

None of this is glamorous engineering. It is the kind of careful plumbing that keeps a multi-advertiser platform running at 2am without anyone needing to intervene.

Comments