← All posts

Building a Production LLM Layer in Go

In Q4 2024 we shipped the AI feature — an AI layer that analyses campaign performance data, surfaces anomalies, and generates human-readable recommendations for advertisers. Building it taught me more about production AI systems than any course or blog post. This is the unfiltered account: what worked, what failed, and the architecture we ended up with after several iterations.

The Problem We Were Solving

Our platform manages advertising campaigns for hundreds of advertisers on Amazon. Each advertiser has dozens of campaigns, hundreds of ad groups, thousands of keywords, and daily performance metrics for all of them. Identifying what needs attention — which keyword bid is too high, which campaign is bleeding budget without converting, which new product launch is outperforming expectations — requires reading a lot of data and making nuanced judgments. We were doing this manually in customer success calls. our AI product was the attempt to automate it.

Architecture: Thin LLM Service at the Edge

The most important architectural decision was keeping the LLM integration isolated. A thin Go service handles all AI interactions: prompt construction, rate limiting, response parsing, caching, and fallbacks. The rest of the backend — campaign management, bidding, reporting — has no knowledge that Claude exists.

// Package structure
intelligence/
├── service.go       // main service — orchestrates analysis
├── prompts.go       // prompt templates
├── claude.go        // Claude API client wrapper
├── cache.go         // Redis caching layer
├── parser.go        // structured response extraction
└── ratelimit.go     // per-advertiser rate limiting

type IntelligenceService struct {
    claude    *ClaudeClient
    cache     *redis.Client
    limiter   *PerAccountLimiter
    db        *sql.DB
}

func (s *IntelligenceService) AnalyseAccount(ctx context.Context, accountID string) (*AccountAnalysis, error) {
    cacheKey := fmt.Sprintf("analysis:%s:%s", accountID, today())
    if cached := s.getCache(ctx, cacheKey); cached != nil {
        return cached, nil
    }

    // Summarise account data before sending to Claude
    summary, err := s.buildAccountSummary(ctx, accountID)
    if err != nil { return nil, err }

    if err := s.limiter.Wait(ctx, accountID); err != nil {
        return nil, err
    }

    analysis, err := s.claude.AnalyseWithSchema(ctx, summary)
    if err != nil { return nil, err }

    s.setCache(ctx, cacheKey, analysis, 2*time.Hour)
    return analysis, nil
}

Prompt Engineering: What Actually Works in Production

The prompt engineering lessons that survived contact with real data:

1. Summarise before analysing. Do not send raw data. A Sponsored Products account with 500 campaigns has hundreds of thousands of rows of daily metrics. Compress it: aggregate by campaign, identify outliers, compute ratios (ACOS, CTR, CVR). Send the summary, not the data.

func (s *IntelligenceService) buildAccountSummary(ctx context.Context, accountID string) (AccountSummary, error) {
    // Aggregate last 7 days
    campaigns, _ := s.db.QueryContext(ctx, `
        SELECT
            campaign_id, campaign_name,
            SUM(spend) as total_spend,
            SUM(sales) as total_sales,
            SUM(impressions) as impressions,
            SUM(clicks) as clicks,
            ROUND(SUM(spend)/NULLIF(SUM(sales),0)*100, 2) as acos,
            ROUND(SUM(clicks)/NULLIF(SUM(impressions),0)*100, 4) as ctr
        FROM campaign_performance
        WHERE account_id = $1 AND date >= NOW() - INTERVAL '7 days'
        GROUP BY campaign_id, campaign_name
        ORDER BY total_spend DESC
        LIMIT 50`, accountID) // top 50 by spend — enough for signal

    // ... scan and structure
    return summary, nil
}

2. Require structured output in the system prompt. Free-form text responses are useless in a production system. We define a JSON schema in the system prompt and validate every response against it before using it:

const systemPrompt = `You are an Amazon advertising specialist analysing campaign performance.

Respond ONLY with valid JSON matching this exact schema:
{
  "overall_health": "good" | "needs_attention" | "critical",
  "key_insights": [string],        // max 3, each under 80 chars
  "top_opportunities": [{
    "campaign_id": string,
    "issue": string,               // what is wrong
    "recommendation": string,      // specific action to take
    "estimated_impact": string     // e.g. "reduce ACOS by ~15%"
  }],                              // max 3 items
  "summary": string                // 2-3 sentence human-readable summary
}

Do not include any text outside the JSON object.`

3. Be specific about what you want. "Analyse this campaign data" produces vague output. "Identify the three campaigns with the highest ACOS relative to the account average and explain what is driving it" produces actionable output.

The Claude API Client

type ClaudeClient struct {
    client    *anthropic.Client
    model     string
    maxTokens int
}

func (c *ClaudeClient) AnalyseWithSchema(ctx context.Context, summary AccountSummary) (*AccountAnalysis, error) {
    summaryJSON, _ := json.Marshal(summary)

    for attempt := 0; attempt < 2; attempt++ {
        resp, err := c.client.Messages.New(ctx, anthropic.MessageNewParams{
            Model:     anthropic.F(c.model),
            MaxTokens: anthropic.F(int64(c.maxTokens)),
            System: anthropic.F([]anthropic.TextBlockParam{
                anthropic.NewTextBlock(systemPrompt),
            }),
            Messages: anthropic.F([]anthropic.MessageParam{
                anthropic.NewUserMessage(anthropic.NewTextBlock(
                    "Account summary:
" + string(summaryJSON),
                )),
            }),
        })
        if err != nil { return nil, err }

        text := resp.Content[0].Text
        analysis, err := parseAnalysis(text)
        if err != nil {
            if attempt == 0 {
                // Retry once with an explicit correction
                slog.Warn("parse failed, retrying with correction", "raw", text[:100])
                continue
            }
            return nil, fmt.Errorf("response unparseable after retry: %w", err)
        }
        return analysis, nil
    }
    return nil, ErrUnparseable
}

func parseAnalysis(raw string) (*AccountAnalysis, error) {
    // Strip markdown code blocks if present
    raw = strings.TrimPrefix(strings.TrimSpace(raw), "```json")
    raw = strings.TrimPrefix(raw, "```")
    raw = strings.TrimSuffix(raw, "```")
    raw = strings.TrimSpace(raw)

    var analysis AccountAnalysis
    if err := json.Unmarshal([]byte(raw), &analysis); err != nil {
        return nil, err
    }
    return validate(&analysis)
}

Caching: The Biggest Cost Lever

Without caching, every page load that shows analysis results would call Claude. With 2,000 advertisers checking their dashboards throughout the day, this would be financially unsustainable. Our cache key includes the account ID and the current day, giving us 2-hour freshness at a fraction of the token cost:

func cacheKey(accountID string) string {
    // Re-analyse twice per day — morning and afternoon
    slot := time.Now().UTC().Hour() / 12 // 0 or 1
    return fmt.Sprintf("intelligence:%s:%s:%d",
        accountID, time.Now().UTC().Format("2006-01-02"), slot)
}

func (s *IntelligenceService) getCache(ctx context.Context, key string) *AccountAnalysis {
    data, err := s.cache.Get(ctx, key).Bytes()
    if err != nil { return nil }
    var analysis AccountAnalysis
    json.Unmarshal(data, &analysis)
    return &analysis
}

Caching cut our Claude API costs by 67% in the first month. The insight rate is slightly lower (stale by up to 12 hours) but advertisers found this completely acceptable — campaign performance changes are mostly meaningful on a daily basis, not hourly.

Rate Limiting: Per-Account, Not Global

Different advertisers check their dashboards at different times. A global rate limiter would starve small accounts during peak hours. We use per-account rate limiting — each account gets at most 5 analysis requests per hour:

type PerAccountLimiter struct {
    mu       sync.Mutex
    limiters map[string]*rate.Limiter
}

func (l *PerAccountLimiter) Wait(ctx context.Context, accountID string) error {
    l.mu.Lock()
    limiter, ok := l.limiters[accountID]
    if !ok {
        limiter = rate.NewLimiter(rate.Every(12*time.Minute), 1) // 5/hour
        l.limiters[accountID] = limiter
    }
    l.mu.Unlock()
    return limiter.Wait(ctx)
}

What We Got Wrong

A few things that failed and what we learned:

  • Sending too much data: our first prompts included raw daily data for every campaign. Token costs were enormous and quality was poor. Summarising first dramatically improved both.
  • Trusting unvalidated responses: early on we used the text response directly in the UI without validation. When Claude occasionally returned malformed JSON, users saw error messages. Schema validation + retry fixed this.
  • Not logging prompt/response pairs: debugging quality issues without a log of what we sent and received was painful. We now log every prompt/response pair (sampled at 10%) to a separate analytics database.
  • Model selection: we started with Claude Opus for everything. The analysis quality was excellent but costs were high. After A/B testing, Claude Sonnet 3.5 provided equivalent quality at 5× lower cost for this use case.

Production Results

our AI product has been in production for several months. Advertisers who use the recommendations feature show better campaign performance on average than those who do not. The AI layer processes ~500 account analyses per day, costs significantly less than an additional customer success hire, and operates without manual intervention. The architecture — thin isolated service, aggressive caching, schema-validated responses — has been stable and cost-predictable from the first month.

Comments