Idempotency Keys

Network world:

Exactly one delivery is not realistic
Exactly one processing / effect is realistic

Transport layer
- HTTP, queues, network
- Can drop, retry, duplicate, reorder
- So we accept “at least once delivery”
Application layer - Handler + DB + idempotency table
- Goal: “exactly once processing” of the business operation
- Achieved by:
  - key is unique
  - first request does the work and stores result
  - later duplicates only read result

Process must rely on the database Unique Constraint at the start of the process.

Attempt to insert (key, status=IN_PROGRESS).
If insertion fails (Unique Violation), this is a duplicate. Wait for the other thread to finish or read the result.
If insertion succeeds, proceed to process.

Formula:

at least once delivery + idempotent handler = exactly once effect

Where “effect” means:

one payment created
one subscription created
one email sent

Important detail

Exactly once processing is inside our system
If we call external systems, we need idempotency there too
- outbox table
- idempotency key in request to external API
- or accept that their side is only “at least once”

Client side

Client creates the idempotency key.
Use UUID v7 per operation.
Store it in db while the operation is pending.
Send the same key on every retry of that operation.
New business action → new key.

Server side - data model
Table idempotency (example):

idempotency_key
endpoint or operation_type
request_hash (method + path + normalized body)
status enum: IN_PROGRESS, COMPLETED, FAILED
response_code
response_body (or pointer to stored result)
created_at, updated_at
TTL

Constraints:

Unique index on (idempotency_key, endpoint, tenant)

If idempotency_key exists but request_hash (or user ID) does not match the incoming request, we must throw a hard error (422 Unprocessable Entity or 409 Conflict), not return the cached result.

Data Retention

If keys are kept forever, idempotency table will grow infinitely, slowing down lookups. UUID indices are large. Usually, idempotency keys are valid for a window (24 or 48).

Idempotency = “at most one success”, not “never retry after failure”

1. Strict cache semantics

I think this is wrong

Same key → same result forever

COMPLETED → always return stored success
FAILED_FINAL → always return stored error
No more processing for that key

Use when:

money
irreversible actions
you want perfect audit “this key = this outcome”

Downside

transient problems (timeout, 500) get frozen as permanent failure
- But we should never store a transient error (like a DB connection timeout or a 500 Internal Server Error) as a permanent entry in idempotency table.
  
  If we store a 500 error against Key A, and the client retries Key A five seconds later when the server is healthy, you will return the cached 500 error. This breaks the system. As client must create a new key for a new attempt
- If an error is transient, do not commit the idempotency row (or roll back the transaction). Let the request fail. When the client retries with the same key, the server should find no record and attempt the work again. Only store Successful results or Non-retryable errors (like 400 Bad Request).

2. Retryable failures semantics

Same key → at most one success, failures can be retried

Extend your status:

IN_PROGRESS
COMPLETED
FAILED_RETRYABLE ( I think thats wrong If the server crashes before it can write FAILED_RETRYABLE to the DB, then state machine breaks.)
FAILED_FINAL

On new request with same key:

COMPLETED
- return stored success
FAILED_FINAL
- return stored error
FAILED_RETRYABLE
- try the operation again
- if success → set COMPLETED
- if same type of error again → maybe bump counter, decide when to switch to FAILED_FINAL