Reliability

Reliability that an alerting platform has to earn

The whole point of an alerting tool is to survive the worst minutes of the year. Here is what WardenPoint does to be ready for them: channel redundancy, predictable retries, acknowledgement-aware chains and an audit log you can read.

Start free Open status page

Uptime target: 99.99%
P99 dispatch: ~142ms
Channels per step: 3+

An alert that survives a bad day

+0s
API ingest — Accepted; UUID assigned
+1s
Telegram — Voice message dispatched
+10s
Voice fallback — Telegram unreachable · PSTN call placed
+14s
Responder — DTMF 1 · chain cancelled

Four pillars

How we keep an alert alive

Four reliability concerns that an alerting platform has to address. Each lists what is wired in code today, not what we hope to ship.

01 · Channel redundancy

Multiple channels per step

Each escalation step can fire more than one channel in parallel. If Telegram is down, the phone call still rings. If both fail, SMS is the configured fallback.

Per-step channel arrays — fire many in parallel
Fallback channel per channel — if Telegram fails, SMS takes over
Carrier rotation for voice and SMS — no single provider failure stops dispatch
Channel health tracked in the audit log per dispatch

02 · Retry policy

Predictable, bounded retries

We retry the same channel before moving on, but never indefinitely. Retry intervals are explicit; failures fall through to the next channel.

Configurable per-channel attempts (default 3 for voice, 2 for Telegram)
Exponential or linear backoff per channel
Idempotency by (api_key, idempotency_key) — duplicate dispatches collapse
Hard cap per chain so a runaway alert cannot page forever

03 · Acknowledgement-aware

The chain stops when someone owns it

Acknowledgement from any channel that received the alert cancels the rest of the chain. The audit log records who acked and when.

Ack from Telegram button, SMS reply, voice DTMF, email click or dashboard tap
Resolved hook from monitoring source cancels the chain automatically
Re-dispatch handled cleanly when the responder hands off mid-incident
No lingering retries after ack — guaranteed by the queue cancellation step

04 · Audit trail

Structured, queryable, exportable

Every dispatch, ack, escalation and resolve writes a JSON line. The shape is stable; old fields stay; new fields are added without breaking parsers.

JSON Lines audit log per company
Stable schema with notification_uuid, channel, status, actor, ip, request_id
CSV export per recipient group for SLA reviews
Linked to application logs via request_id

Retry policy

What we retry, and when we stop

Retries should be predictable. Pick the channel attempts, the backoff and the fallback per channel. WardenPoint ships sensible defaults and lets paid plans override them.

Voice calls retry up to 3× with exponential backoff before falling back to Telegram voice
Telegram retries 2× with linear backoff, then falls back to SMS
SMS retries 2× with provider rotation; carrier 5xx fails over to the next provider
Email retries on transient SMTP 4xx; permanent 5xx hard-fails and the chain moves on
Every retry decision lands in the audit log so the path is reconstructible

config/escalation.phpPHP

# config/escalation.php — retry policy

'retry' => [

'voice_call' => [

'attempts' => 3,

'backoff' => 'exponential',

'fallback_channel' => 'telegram_voice',

'telegram_voice' => [

'attempts' => 2,

'backoff' => 'linear',

'fallback_channel' => 'sms',

Honest numbers

Numbers we operate against

Uptime target: 99.99%
P99 dispatch latency: ~142ms
Channels per step: 3+
Retry hard cap: Per-chain

Reliability FAQ

Common reliability questions

What is the uptime target?: 99.99% for the public API ingest and the dispatcher tier. The status page reports the actual rolling figure with a 90-day timeline.
What happens during a regional carrier outage?
If WardenPoint itself goes down, what do customers see?
Are retries idempotent?
How do you test reliability changes?

Free plan

Prove the reliability claims with your own test

Set up a recipient, kill Telegram on the responder phone and watch the fallback fire. The audit line tells the whole story.

Start free Open status page

Free forever plan
Audit log per dispatch
Carrier rotation built in