Infra status page

Infra status page

A single screen showing the health of all four environments — prod, preprod, sandbox, demo. Operator can see at a glance whether everything is up; superadmin can see deeper diagnostics.

Open at https://status.flatsbratislava.com (separate subdomain, no login required for the public health board).

What's on the page

Environment health cards

One card per environment:

Card Color signals
prod green = HTTP 200 + asset smoke OK + browser mount OK; yellow = degraded (slow response, partial chunks); red = down
preprod same checks
sandbox same checks, plus the always-on machine status (min_machines_running=1 per gh#f001ec8c)
demo same checks, plus the gate-password sole-login flag

Each card shows:

  • HTTP smoke (current response time + size)
  • Last successful asset smoke time
  • Last successful browser-mount time
  • DB ping status

Per-component DB ping

Below the cards, a per-env DB ping row:

  • Connection latency
  • Migration version (commit SHA the env is on)
  • Row counts for sentinel tables (tenants, reservations, messages)

Recent incidents

A chronological list of the last 30 days of incident windows (from app_errors + manual operator entries). Each row links to the post-mortem doc when available.

Public view vs operator view

  • Public (no login) — green/yellow/red cards only. Used for "is the site up?" checks.
  • Logged-in operators — see DB ping + recent incidents + the env they're connected to is highlighted.
  • Superadmins — see the deep diagnostic panel: Fly machine list per env, Redis health, S3 egress, recent deploy commits.

How the checks run

A separate worker (not the main app) runs every minute:

  1. Curls each env's root URL.
  2. Pulls the served HTML, extracts chunk URLs, curls each one.
  3. Runs a headless playwright mount check.
  4. DB ping.
  5. Writes the result row to a infra_health table.

The status page reads the latest row per env.

What this is NOT

  • It's not a per-feature health board (that lives in the Cost Observability Dashboard for AI surfaces and inline error indicators in V28 for per-screen state).
  • It's not a SLA / uptime certificate. It's an operator-facing visibility surface.

Implements: gh#533 (Infra status page — single view of all 4 envs' app + DB health).

Source: the FlatsBratislava operator manual.