Architecture¶
alertkube is a single Go binary and an event-to-alert pipeline: observe Kubernetes resources, detect bad conditions, dedupe/suppress, route, and deliver. It uses client-go informers directly; see ADR-0001.
The pipeline¶
flowchart TB
subgraph Sources
W[9 Watchers<br/>Pod · Node · Deployment · StatefulSet<br/>DaemonSet · Job · CronJob · PVC · HPA]
RC[Alertmanager receiver<br/>POST /api/v1/alerts]
end
W -- emit --> EM[makeEmitter]
RC -- toAlert --> EM
EM --> SO[Severity overrides]
SO --> ST[Store<br/>mute window · dedupe · resolve TTL]
ST --> RT[Router<br/>silence · inhibition · route match]
RT --> GR[Grouper<br/>storm folding]
GR --> DI[Registry.Dispatch<br/>concurrent fan-out · per-sink timeout]
DI --> SK[Sinks]
ST <-->|snapshot| PE[(ConfigMap<br/>persistence)]
SW[Sweeper · 30s] -->|resolve TTL · escalations| ST
| Stage | Package | Role |
|---|---|---|
| Watch | internal/watchers |
observe a resource, detect a failure condition, emit *alert.Alert |
| Identify | internal/alert |
ComputeFingerprint (sha256) - stable identity / join key |
| Dedup | internal/alert (Store) |
mute window, last-sent tracking, resolve TTL |
| Route | internal/router |
silences, inhibitions, route → sink matching |
| Group | internal/group |
storm folding (first passes, rest absorbed into a summary) |
| Dispatch | internal/sinks (Registry) |
concurrent fan-out, per-sink rate limit + 15s timeout |
| Persist | internal/persist |
ConfigMap snapshot, survives restarts |
| Sweep | sweeper.go |
synthetic resolves, escalations, history cleanup |
Main wiring: main() -> runController() -> buildWatchers() / buildSinks() -> makeEmitter().
The fingerprint is the spine¶
Every downstream stage depends on the alert fingerprint:
// sha256(kind|ns|name|reason), truncated to 12 hex chars
func ComputeFingerprint(kind Kind, ns, name, reason string) string
It keys dedupe, grouping, persistence, and PagerDuty/Opsgenie incident correlation. Changing it invalidates persisted state.
The suppression triple¶
alertkube has four suppression mechanisms:
| Mechanism | Where | Keyed on | Purpose |
|---|---|---|---|
| Mute window | Store |
fingerprint + time | don't resend the same alert within N seconds |
| Silence | Router |
label matchers + until |
suppress matching alerts until a timestamp |
| Inhibition | Router |
source alert active | alert A suppresses dependent alert B |
| Annotation silence | Router |
alert-silence-until |
a workload silences itself (can be disabled) |
See silence vs inhibition vs mute window.
Two sink families¶
All sinks implement Name, Send, and Supports. They split into two families:
- HTTP-push sinks: Slack, Teams, Discord, Telegram, generic webhook.
- Stateful incident sinks: PagerDuty and Opsgenie, keyed by fingerprint.
Stateful sinks must receive every resolve and must never receive grouped summaries. internal/app/pipeline.go enforces that split (statefulSinks + dropStateful/keepStateful).
Durability¶
internal/persist snapshots active alerts and mute history to a ConfigMap. Restarts still send pending resolves and do not re-page standing conditions. Snapshots strip Details and enforce a size guard; see ADR-0003.
High availability¶
With leaderElection.enabled=true, a coordination.k8s.io Lease ensures only the leader dispatches. Followers serve metrics and health while /readyz stays 503 until they acquire leadership. See run alertkube in HA.