Idempotency Keys
Network world:
- Exactly one delivery is not realistic
- Exactly one processing / effect is realistic
- Transport layer
- HTTP, queues, network
- Can drop, retry, duplicate, reorder
- So we accept “at least once delivery”
- Application layer
- Handler + DB + idempotency table
-
Goal: “exactly once processing” of the business operation
-
Achieved by:
- key is unique
- first request does the work and stores result
- later duplicates only read result
-
Process must rely on the database Unique Constraint at the start of the process.
- Attempt to insert
(key, status=IN_PROGRESS). - If insertion fails (Unique Violation), this is a duplicate. Wait for the other thread to finish or read the result.
- If insertion succeeds, proceed to process.
Formula:
at least once delivery + idempotent handler = exactly once effect
Where “effect” means:
- one payment created
- one subscription created
- one email sent
Important detail
- Exactly once processing is inside our system
- If we call external systems, we need idempotency there too
- outbox table
- idempotency key in request to external API
- or accept that their side is only “at least once”
Client side
- Client creates the idempotency key.
- Use UUID v7 per operation.
- Store it in db while the operation is pending.
- Send the same key on every retry of that operation.
- New business action → new key.
Server side - data model
Table idempotency (example):
idempotency_keyendpointoroperation_typerequest_hash(method + path + normalized body)statusenum:IN_PROGRESS,COMPLETED,FAILEDresponse_coderesponse_body(or pointer to stored result)created_at,updated_at- TTL
Constraints:
- Unique index on
(idempotency_key, endpoint, tenant)
If idempotency_key exists but request_hash (or user ID) does not match the incoming request, we must throw a hard error (422 Unprocessable Entity or 409 Conflict), not return the cached result.
Data Retention
If keys are kept forever, idempotency table will grow infinitely, slowing down lookups. UUID indices are large. Usually, idempotency keys are valid for a window (24 or 48).
- Idempotency = “at most one success”, not “never retry after failure”
1. Strict cache semantics
I think this is wrong
Same key → same result forever
COMPLETED→ always return stored successFAILED_FINAL→ always return stored error- No more processing for that key
Use when:
- money
- irreversible actions
- you want perfect audit “this key = this outcome”
Downside
-
transient problems (timeout, 500) get frozen as permanent failure
-
But we should never store a transient error (like a DB connection timeout or a 500 Internal Server Error) as a permanent entry in idempotency table.
If we store a 500 error against Key A, and the client retries Key A five seconds later when the server is healthy, you will return the cached 500 error. This breaks the system. As client must create a new key for a new attempt
-
If an error is transient, do not commit the idempotency row (or roll back the transaction). Let the request fail. When the client retries with the same key, the server should find no record and attempt the work again. Only store Successful results or Non-retryable errors (like 400 Bad Request).
-
2. Retryable failures semantics
Same key → at most one success, failures can be retried
Extend your status:
IN_PROGRESSCOMPLETEDFAILED_RETRYABLE( I think thats wrong If the server crashes before it can writeFAILED_RETRYABLEto the DB, then state machine breaks.)FAILED_FINAL
On new request with same key:
COMPLETED- return stored success
FAILED_FINAL- return stored error
FAILED_RETRYABLE- try the operation again
- if success → set
COMPLETED - if same type of error again → maybe bump counter, decide when to switch to
FAILED_FINAL
How to decide retryable vs final
Examples:
- Retryable
- DB deadlock
- external API 5xx
- network timeout
- Final
- validation error
- “insufficient funds”
- “resource not found” for given input