Idempotency Keys in Distributed Go Services
Distributed systems are fundamentally unreliable. Networks drop packets, services restart mid-request, clients retry on timeout, and load balancers reroute connections. The standard response to this reality is to design every state-modifying operation to be idempotent — safe to call multiple times with the same result as calling it once. Idempotency keys are the primary tool for achieving this at the API level.
The Core Problem
Consider a client that sends a POST request to create a campaign. The server receives the request, creates the campaign, and then a network failure prevents the response from reaching the client. The client, having received no response, retries the request. Without idempotency, the server creates a second campaign. Now you have two identical campaigns — a data integrity problem that is difficult to detect and painful to clean up.
We manage campaigns for thousands of advertisers. A double-creation bug is not just a data integrity issue — it is a budget issue (two campaigns competing in the same auction, doubling spend) and a trust issue with the advertiser. Idempotency is not optional.
The Idempotency Key Pattern
The client generates a unique key for each logical operation and sends it in a request header. The server uses this key to detect duplicate requests and return the cached response without repeating the operation:
// Client side — generate a stable key per logical operation
campaignKey := uuid.NewString() // generated once per operation, stored client-side
// First attempt
resp, err := client.Do(req.Clone(ctx).WithHeader("Idempotency-Key", campaignKey))
// If timeout — retry with the SAME key
if err != nil || resp.StatusCode >= 500 {
resp, err = client.Do(req.Clone(ctx).WithHeader("Idempotency-Key", campaignKey))
}
Server-Side Implementation: HTTP Middleware
The cleanest server-side implementation is a middleware that intercepts requests with an Idempotency-Key header, checks the store, returns cached responses for duplicates, and caches new responses for future duplicates:
type IdempotencyStore interface {
// Returns (response, true) if key exists; (nil, false) if new
Get(ctx context.Context, key string) ([]byte, bool, error)
// Stores response — must be atomic (no race between check and set)
Set(ctx context.Context, key string, response []byte, ttl time.Duration) error
}
func IdempotencyMiddleware(store IdempotencyStore) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
key := r.Header.Get("Idempotency-Key")
if key == "" || r.Method == http.MethodGet {
// GET requests are inherently idempotent — no key needed
next.ServeHTTP(w, r)
return
}
// Validate key format (must be UUID to prevent abuse)
if _, err := uuid.Parse(key); err != nil {
http.Error(w, "Idempotency-Key must be a valid UUID", http.StatusBadRequest)
return
}
// Check cache
if cached, found, err := store.Get(r.Context(), key); err == nil && found {
w.Header().Set("Idempotent-Replayed", "true")
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
w.Write(cached)
return
}
// Capture response
rec := &responseRecorder{ResponseWriter: w, code: http.StatusOK}
next.ServeHTTP(rec, r)
// Cache successful responses (2xx only — do not cache errors)
if rec.code >= 200 && rec.code < 300 {
store.Set(r.Context(), key, rec.body.Bytes(), 24*time.Hour)
}
})
}
}
type responseRecorder struct {
http.ResponseWriter
code int
body bytes.Buffer
}
func (r *responseRecorder) WriteHeader(code int) {
r.code = code
r.ResponseWriter.WriteHeader(code)
}
func (r *responseRecorder) Write(b []byte) (int, error) {
r.body.Write(b)
return r.ResponseWriter.Write(b)
}
Redis Implementation
Redis is the most common backing store for idempotency keys. Use SET NX PX for atomic set-if-not-exists with TTL:
type RedisIdempotencyStore struct {
rdb *redis.Client
}
func (s *RedisIdempotencyStore) Get(ctx context.Context, key string) ([]byte, bool, error) {
data, err := s.rdb.Get(ctx, "idem:"+key).Bytes()
if errors.Is(err, redis.Nil) {
return nil, false, nil
}
return data, err == nil, err
}
func (s *RedisIdempotencyStore) Set(ctx context.Context, key string, response []byte, ttl time.Duration) error {
// SET NX ensures we do not overwrite an existing entry (handles concurrent requests)
ok, err := s.rdb.SetNX(ctx, "idem:"+key, response, ttl).Result()
if err != nil { return err }
if !ok {
// Another goroutine stored the response first — fine, our response is consistent
}
return nil
}
Handling In-Progress Requests: The Race Condition
A subtle issue: what if two identical requests arrive simultaneously? Both check the store, both find nothing, both start processing. Now you have two goroutines creating the same campaign. The solution: use a two-phase locking approach. Store a "processing" marker first, then update it with the final response:
const statusProcessing = "PROCESSING"
func (s *RedisIdempotencyStore) Lock(ctx context.Context, key string) (bool, error) {
// Try to claim this key as "in-progress"
ok, err := s.rdb.SetNX(ctx, "idem:"+key, statusProcessing, 30*time.Second).Result()
return ok, err
}
// In middleware:
if locked, _ := store.Lock(r.Context(), key); !locked {
// Another request is processing this key — wait and return cached result
for i := 0; i < 10; i++ {
time.Sleep(500 * time.Millisecond)
if cached, found, _ := store.Get(r.Context(), key); found && string(cached) != statusProcessing {
w.Write(cached)
return
}
}
http.Error(w, "concurrent request still processing", http.StatusConflict)
return
}
DynamoDB Implementation for Critical Operations
For financial operations where Redis unavailability would mean lost idempotency guarantees, DynamoDB conditional writes provide a more durable option:
func (s *DynamoIdempotencyStore) Set(ctx context.Context, key string, response []byte, ttl time.Duration) error {
_, err := s.dynamo.PutItem(ctx, &dynamodb.PutItemInput{
TableName: aws.String("idempotency-keys"),
Item: map[string]dynamotypes.AttributeValue{
"pk": &dynamotypes.AttributeValueMemberS{Value: key},
"response": &dynamotypes.AttributeValueMemberB{Value: response},
"ttl": &dynamotypes.AttributeValueMemberN{
Value: strconv.FormatInt(time.Now().Add(ttl).Unix(), 10),
},
},
// Only store if this key does not already exist
ConditionExpression: aws.String("attribute_not_exists(pk)"),
})
var condErr *dynamotypes.ConditionalCheckFailedException
if errors.As(err, &condErr) {
return nil // concurrent request stored it first, both responses are equivalent
}
return err
}
TTL: How Long to Keep Keys
The industry standard is 24 hours. This covers any reasonable client retry window. Some payment processors use 7 days for financial idempotency (matching their reconciliation window). Match your TTL to your retry window — a client that gives up retrying after 2 hours does not need a 7-day TTL.
Never store idempotency keys indefinitely. The storage cost is small per key but accumulates at scale, and it creates a misleading guarantee (a key from 6 months ago should not be treated as a duplicate today).
Testing Idempotency
Idempotency should be an explicit test category:
func TestCreateCampaign_Idempotent(t *testing.T) {
key := uuid.NewString()
// First request — should succeed
resp1 := createCampaign(t, key)
assert.Equal(t, 201, resp1.StatusCode)
var c1 Campaign
json.NewDecoder(resp1.Body).Decode(&c1)
// Second request with same key — should return cached response
resp2 := createCampaign(t, key)
assert.Equal(t, 200, resp2.StatusCode)
assert.Equal(t, "true", resp2.Header.Get("Idempotent-Replayed"))
var c2 Campaign
json.NewDecoder(resp2.Body).Decode(&c2)
// Must be identical
assert.Equal(t, c1.ID, c2.ID)
// Only one campaign should exist in the database
count := countCampaignsInDB(t)
assert.Equal(t, 1, count)
}
When to Apply Idempotency Keys
Apply idempotency keys to any operation that: creates a resource, transfers money or allocates budget, sends an external notification (email, SMS), or modifies state in a way that is difficult to reverse. You do not need idempotency keys for read operations (inherently idempotent) or operations that are naturally safe to repeat (e.g., setting a value to a specific amount rather than incrementing it).
Comments
Post a Comment