shipped github ↗

webhook-gateway

Self-hosted webhook reliability with exponential backoff + DLQ.

NestJS Postgres BullMQ TypeScript

webhook-gateway

retry with exponential backoff, then DLQ

shipped

delivery timeline—

fired

click “fire delivery” to start

no attempts yet

downstream down

Elevator pitch

A self-hosted webhook receiver that persists every inbound delivery, retries on failure with exponential backoff, parks the dead ones in a DLQ, and lets me search / replay anything by signature, route, or payload shape — without paying for Hookdeck.

What it is

A NestJS service backed by Postgres + BullMQ. The flow is:

Inbound webhook POSTs to /in/{source} (Stripe, Shopify, GitHub, generic HMAC).
The receiver verifies signature, persists the raw envelope (headers + body + auth context) into Postgres immediately, ACKs 200 within p99 < 30ms.
A BullMQ job is enqueued for the downstream delivery — to my own service, or to a third-party endpoint.
On failure: retry against the schedule [30s, 2m, 10m, 1h, 6h, 24h]. If all six attempts fail, the delivery is parked in the DLQ.
Operators can search by source, route, status, time, signature, or arbitrary JSON path, and bulk-replay anything from the DLQ.

Status


Repo	github.com/mateokadiu/webhook-gateway
License	MIT
Status	v1.0.0
Stack	NestJS 11, Postgres 16, BullMQ 5, Redis 7

The problem I was solving

Webhook reliability is the kind of thing every team rebuilds and every team gets wrong on the third edge case. The default Stripe / Shopify / GitHub setup is “we’ll retry a few times then give up”, with no visibility into what was actually sent, no way to replay, and no audit trail.

I needed a layer in front of my own services that:

ACKs fast (otherwise the source thinks I’m dead and starts disabling my endpoint)
Persists the raw envelope before doing anything else (so I can replay after a bad deploy)
Retries with the right backoff curve for the downstream’s actual recovery time
Lets me search and bulk-replay

Key decisions

Persist-first, deliver-second. The receiver writes the envelope to Postgres before enqueueing the delivery job. Even if Redis explodes, every delivery is on disk and can be re-enqueued.
Exponential backoff schedule fixed at [30s, 2m, 10m, 1h, 6h, 24h]. Empirically tuned — short-window retries cover transient blips, long-window retries cover deploys and database failovers.
DLQ is just a status, not a separate table. Same deliveries table; status flips to dead. Replay is “set status back to pending, requeue”.
Signature verification at the edge. If Stripe-Signature fails I 400 immediately — never persist a request that I can’t prove origin on.
HMAC routes for generic sources. Anything that can sign with a shared secret can target /in/generic/{routeId}.

Numbers

p99 < 30ms ack time for inbound webhooks
6 retries on the default schedule, 31h 42m 30s total window
3 source presets out of the box: Stripe, Shopify, GitHub