Infra status page
Infra status page
A single screen showing the health of all four environments — prod, preprod, sandbox, demo. Operator can see at a glance whether everything is up; superadmin can see deeper diagnostics.
Open at https://status.flatsbratislava.com (separate subdomain, no login required for the public health board).
What's on the page
Environment health cards
One card per environment:
| Card | Color signals |
|---|---|
| prod | green = HTTP 200 + asset smoke OK + browser mount OK; yellow = degraded (slow response, partial chunks); red = down |
| preprod | same checks |
| sandbox | same checks, plus the always-on machine status (min_machines_running=1 per gh#f001ec8c) |
| demo | same checks, plus the gate-password sole-login flag |
Each card shows:
- HTTP smoke (current response time + size)
- Last successful asset smoke time
- Last successful browser-mount time
- DB ping status
Per-component DB ping
Below the cards, a per-env DB ping row:
- Connection latency
- Migration version (commit SHA the env is on)
- Row counts for sentinel tables (
tenants,reservations,messages)
Recent incidents
A chronological list of the last 30 days of incident windows (from app_errors + manual operator entries). Each row links to the post-mortem doc when available.
Public view vs operator view
- Public (no login) — green/yellow/red cards only. Used for "is the site up?" checks.
- Logged-in operators — see DB ping + recent incidents + the env they're connected to is highlighted.
- Superadmins — see the deep diagnostic panel: Fly machine list per env, Redis health, S3 egress, recent deploy commits.
How the checks run
A separate worker (not the main app) runs every minute:
- Curls each env's root URL.
- Pulls the served HTML, extracts chunk URLs, curls each one.
- Runs a headless playwright mount check.
- DB ping.
- Writes the result row to a
infra_healthtable.
The status page reads the latest row per env.
What this is NOT
- It's not a per-feature health board (that lives in the Cost Observability Dashboard for AI surfaces and inline error indicators in V28 for per-screen state).
- It's not a SLA / uptime certificate. It's an operator-facing visibility surface.
Implements: gh#533 (Infra status page — single view of all 4 envs' app + DB health).
Source: the FlatsBratislava operator manual.