Idempotency Keys

Network world:

  • Exactly one delivery is not realistic
  • Exactly one processing / effect is realistic
  1. Transport layer
    • HTTP, queues, network
    • Can drop, retry, duplicate, reorder
    • So we accept “at least once delivery”
  2. Application layer - Handler + DB + idempotency table
    • Goal: “exactly once processing” of the business operation

    • Achieved by:

      • key is unique
      • first request does the work and stores result
      • later duplicates only read result

Process must rely on the database Unique Constraint at the start of the process.

  • Attempt to insert (key, status=IN_PROGRESS).
  • If insertion fails (Unique Violation), this is a duplicate. Wait for the other thread to finish or read the result.
  • If insertion succeeds, proceed to process.

Formula:

at least once delivery + idempotent handler = exactly once effect

Where “effect” means:

  • one payment created
  • one subscription created
  • one email sent

Important detail

  • Exactly once processing is inside our system
  • If we call external systems, we need idempotency there too
    • outbox table
    • idempotency key in request to external API
    • or accept that their side is only “at least once”

Client side

  • Client creates the idempotency key.
  • Use UUID v7 per operation.
  • Store it in db while the operation is pending.
  • Send the same key on every retry of that operation.
  • New business action new key.

Server side - data model
Table idempotency (example):

  • idempotency_key
  • endpoint or operation_type
  • request_hash (method + path + normalized body)
  • status enum: IN_PROGRESS, COMPLETED, FAILED
  • response_code
  • response_body (or pointer to stored result)
  • created_at, updated_at
  • TTL

Constraints:

  • Unique index on (idempotency_key, endpoint, tenant)

If idempotency_key exists but request_hash (or user ID) does not match the incoming request, we must throw a hard error (422 Unprocessable Entity or 409 Conflict), not return the cached result.

Data Retention

If keys are kept forever, idempotency table will grow infinitely, slowing down lookups. UUID indices are large. Usually, idempotency keys are valid for a window (24 or 48).


  • Idempotency = “at most one success”, not “never retry after failure”

1. Strict cache semantics

I think this is wrong

Same key → same result forever

  • COMPLETED → always return stored success
  • FAILED_FINAL → always return stored error
  • No more processing for that key

Use when:

  • money
  • irreversible actions
  • you want perfect audit “this key = this outcome”

Downside

  • transient problems (timeout, 500) get frozen as permanent failure

    • But we should never store a transient error (like a DB connection timeout or a 500 Internal Server Error) as a permanent entry in idempotency table.

      If we store a 500 error against Key A, and the client retries Key A five seconds later when the server is healthy, you will return the cached 500 error. This breaks the system. As client must create a new key for a new attempt

    • If an error is transient, do not commit the idempotency row (or roll back the transaction). Let the request fail. When the client retries with the same key, the server should find no record and attempt the work again. Only store Successful results or Non-retryable errors (like 400 Bad Request).

2. Retryable failures semantics

Same key → at most one success, failures can be retried

Extend your status:

  • IN_PROGRESS
  • COMPLETED
  • FAILED_RETRYABLE ( I think thats wrong If the server crashes before it can write FAILED_RETRYABLE to the DB, then state machine breaks.)
  • FAILED_FINAL

On new request with same key:

  • COMPLETED
    • return stored success
  • FAILED_FINAL
    • return stored error
  • FAILED_RETRYABLE
    • try the operation again
    • if success → set COMPLETED
    • if same type of error again → maybe bump counter, decide when to switch to FAILED_FINAL

How to decide retryable vs final

Examples:

  • Retryable
    • DB deadlock
    • external API 5xx
    • network timeout
  • Final
    • validation error
    • “insufficient funds”
    • “resource not found” for given input

100c1a⁝ Idempotency Keys - DRAFT

100c⁝ Architecture