Building a Production LLM Layer in Go
In Q4 2024 we shipped the AI feature — an AI layer that analyses campaign performance data, surfaces anomalies, and generates human-readable recommendations for advertisers. Building it taught me more about production AI systems than any course or blog post. This is the unfiltered account: what worked, what failed, and the architecture we ended up with after several iterations.
The Problem We Were Solving
Our platform manages advertising campaigns for hundreds of advertisers on Amazon. Each advertiser has dozens of campaigns, hundreds of ad groups, thousands of keywords, and daily performance metrics for all of them. Identifying what needs attention — which keyword bid is too high, which campaign is bleeding budget without converting, which new product launch is outperforming expectations — requires reading a lot of data and making nuanced judgments. We were doing this manually in customer success calls. our AI product was the attempt to automate it.
Architecture: Thin LLM Service at the Edge
The most important architectural decision was keeping the LLM integration isolated. A thin Go service handles all AI interactions: prompt construction, rate limiting, response parsing, caching, and fallbacks. The rest of the backend — campaign management, bidding, reporting — has no knowledge that Claude exists.
// Package structure
intelligence/
├── service.go // main service — orchestrates analysis
├── prompts.go // prompt templates
├── claude.go // Claude API client wrapper
├── cache.go // Redis caching layer
├── parser.go // structured response extraction
└── ratelimit.go // per-advertiser rate limiting
type IntelligenceService struct {
claude *ClaudeClient
cache *redis.Client
limiter *PerAccountLimiter
db *sql.DB
}
func (s *IntelligenceService) AnalyseAccount(ctx context.Context, accountID string) (*AccountAnalysis, error) {
cacheKey := fmt.Sprintf("analysis:%s:%s", accountID, today())
if cached := s.getCache(ctx, cacheKey); cached != nil {
return cached, nil
}
// Summarise account data before sending to Claude
summary, err := s.buildAccountSummary(ctx, accountID)
if err != nil { return nil, err }
if err := s.limiter.Wait(ctx, accountID); err != nil {
return nil, err
}
analysis, err := s.claude.AnalyseWithSchema(ctx, summary)
if err != nil { return nil, err }
s.setCache(ctx, cacheKey, analysis, 2*time.Hour)
return analysis, nil
}
Prompt Engineering: What Actually Works in Production
The prompt engineering lessons that survived contact with real data:
1. Summarise before analysing. Do not send raw data. A Sponsored Products account with 500 campaigns has hundreds of thousands of rows of daily metrics. Compress it: aggregate by campaign, identify outliers, compute ratios (ACOS, CTR, CVR). Send the summary, not the data.
func (s *IntelligenceService) buildAccountSummary(ctx context.Context, accountID string) (AccountSummary, error) {
// Aggregate last 7 days
campaigns, _ := s.db.QueryContext(ctx, `
SELECT
campaign_id, campaign_name,
SUM(spend) as total_spend,
SUM(sales) as total_sales,
SUM(impressions) as impressions,
SUM(clicks) as clicks,
ROUND(SUM(spend)/NULLIF(SUM(sales),0)*100, 2) as acos,
ROUND(SUM(clicks)/NULLIF(SUM(impressions),0)*100, 4) as ctr
FROM campaign_performance
WHERE account_id = $1 AND date >= NOW() - INTERVAL '7 days'
GROUP BY campaign_id, campaign_name
ORDER BY total_spend DESC
LIMIT 50`, accountID) // top 50 by spend — enough for signal
// ... scan and structure
return summary, nil
}
2. Require structured output in the system prompt. Free-form text responses are useless in a production system. We define a JSON schema in the system prompt and validate every response against it before using it:
const systemPrompt = `You are an Amazon advertising specialist analysing campaign performance.
Respond ONLY with valid JSON matching this exact schema:
{
"overall_health": "good" | "needs_attention" | "critical",
"key_insights": [string], // max 3, each under 80 chars
"top_opportunities": [{
"campaign_id": string,
"issue": string, // what is wrong
"recommendation": string, // specific action to take
"estimated_impact": string // e.g. "reduce ACOS by ~15%"
}], // max 3 items
"summary": string // 2-3 sentence human-readable summary
}
Do not include any text outside the JSON object.`
3. Be specific about what you want. "Analyse this campaign data" produces vague output. "Identify the three campaigns with the highest ACOS relative to the account average and explain what is driving it" produces actionable output.
The Claude API Client
type ClaudeClient struct {
client *anthropic.Client
model string
maxTokens int
}
func (c *ClaudeClient) AnalyseWithSchema(ctx context.Context, summary AccountSummary) (*AccountAnalysis, error) {
summaryJSON, _ := json.Marshal(summary)
for attempt := 0; attempt < 2; attempt++ {
resp, err := c.client.Messages.New(ctx, anthropic.MessageNewParams{
Model: anthropic.F(c.model),
MaxTokens: anthropic.F(int64(c.maxTokens)),
System: anthropic.F([]anthropic.TextBlockParam{
anthropic.NewTextBlock(systemPrompt),
}),
Messages: anthropic.F([]anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock(
"Account summary:
" + string(summaryJSON),
)),
}),
})
if err != nil { return nil, err }
text := resp.Content[0].Text
analysis, err := parseAnalysis(text)
if err != nil {
if attempt == 0 {
// Retry once with an explicit correction
slog.Warn("parse failed, retrying with correction", "raw", text[:100])
continue
}
return nil, fmt.Errorf("response unparseable after retry: %w", err)
}
return analysis, nil
}
return nil, ErrUnparseable
}
func parseAnalysis(raw string) (*AccountAnalysis, error) {
// Strip markdown code blocks if present
raw = strings.TrimPrefix(strings.TrimSpace(raw), "```json")
raw = strings.TrimPrefix(raw, "```")
raw = strings.TrimSuffix(raw, "```")
raw = strings.TrimSpace(raw)
var analysis AccountAnalysis
if err := json.Unmarshal([]byte(raw), &analysis); err != nil {
return nil, err
}
return validate(&analysis)
}
Caching: The Biggest Cost Lever
Without caching, every page load that shows analysis results would call Claude. With 2,000 advertisers checking their dashboards throughout the day, this would be financially unsustainable. Our cache key includes the account ID and the current day, giving us 2-hour freshness at a fraction of the token cost:
func cacheKey(accountID string) string {
// Re-analyse twice per day — morning and afternoon
slot := time.Now().UTC().Hour() / 12 // 0 or 1
return fmt.Sprintf("intelligence:%s:%s:%d",
accountID, time.Now().UTC().Format("2006-01-02"), slot)
}
func (s *IntelligenceService) getCache(ctx context.Context, key string) *AccountAnalysis {
data, err := s.cache.Get(ctx, key).Bytes()
if err != nil { return nil }
var analysis AccountAnalysis
json.Unmarshal(data, &analysis)
return &analysis
}
Caching cut our Claude API costs by 67% in the first month. The insight rate is slightly lower (stale by up to 12 hours) but advertisers found this completely acceptable — campaign performance changes are mostly meaningful on a daily basis, not hourly.
Rate Limiting: Per-Account, Not Global
Different advertisers check their dashboards at different times. A global rate limiter would starve small accounts during peak hours. We use per-account rate limiting — each account gets at most 5 analysis requests per hour:
type PerAccountLimiter struct {
mu sync.Mutex
limiters map[string]*rate.Limiter
}
func (l *PerAccountLimiter) Wait(ctx context.Context, accountID string) error {
l.mu.Lock()
limiter, ok := l.limiters[accountID]
if !ok {
limiter = rate.NewLimiter(rate.Every(12*time.Minute), 1) // 5/hour
l.limiters[accountID] = limiter
}
l.mu.Unlock()
return limiter.Wait(ctx)
}
What We Got Wrong
A few things that failed and what we learned:
- Sending too much data: our first prompts included raw daily data for every campaign. Token costs were enormous and quality was poor. Summarising first dramatically improved both.
- Trusting unvalidated responses: early on we used the text response directly in the UI without validation. When Claude occasionally returned malformed JSON, users saw error messages. Schema validation + retry fixed this.
- Not logging prompt/response pairs: debugging quality issues without a log of what we sent and received was painful. We now log every prompt/response pair (sampled at 10%) to a separate analytics database.
- Model selection: we started with Claude Opus for everything. The analysis quality was excellent but costs were high. After A/B testing, Claude Sonnet 3.5 provided equivalent quality at 5× lower cost for this use case.
Production Results
our AI product has been in production for several months. Advertisers who use the recommendations feature show better campaign performance on average than those who do not. The AI layer processes ~500 account analyses per day, costs significantly less than an additional customer success hire, and operates without manual intervention. The architecture — thin isolated service, aggressive caching, schema-validated responses — has been stable and cost-predictable from the first month.
Comments
Post a Comment