Outbound validator + refine loop

Configurable blocking threshold

The outbound validator emits findings with severity levels (info / minor / major / critical). The blocking threshold decides which severities actually block a send vs which just log a finding and let the send go through.

Configurable per-binding + per-template (gh#850).

The three threshold levels

Threshold Blocks on Use when
block-all Any finding (info+) Strictest. Use for high-risk templates (legal-related sends, payment confirmations).
major+critical (default) major + critical only Balanced. Default for most templates.
critical-only critical only Loosest. Use for high-volume low-stakes templates (Thank-you message, generic FAQ replies).

Per-validator-binding default

Each validator binding (e.g. parking_info_relevance, language_match, factual_accuracy) carries a default threshold set tenant-wide.

Edit defaults in AI Cockpit → Validators → click the binding → Default threshold.

Per-template override

A specific task template can override the binding's default. In the Task Templates editor → Validator behavior:

  • Inherit binding default (most templates).
  • Override → pick a different threshold for THIS template only.

Useful when one template (say, Welcome message) needs critical-only because too many major-rated findings are false positives on welcome-message content.

Per-severity confidence threshold (gh#804)

Each validator finding also carries a confidence score (0-1). The binding's confidence threshold decides when a finding is even reported:

  • Default 0.6 — only findings with confidence > 0.6 surface.
  • Lower to 0.4 → more findings surface (more false positives, more sensitive).
  • Raise to 0.8 → fewer findings (only high-confidence problems block).

Configurable per-validator (gh#804). The threshold AND severity combine: a major finding with confidence 0.5 doesn't block under default settings (below 0.6 threshold).

Findings classified as non-issue (gh#860)

Some findings are explicitly non-issue — the validator says "I noticed X but X is fine" or returns a truncated finding. After gh#860 these no longer count toward the verdict — only actionable findings produce a block.

Earlier (pre-gh#860) a disregard finding + a truncated finding could combine to produce a non-clean verdict that blocked a correct send.

Empty-thread-context guard (gh#859)

Before gh#859, an empty translatedBody='' in thread context blanked the guest message for the validator, leading to a "no context" false-positive block. Fix: the validator treats translatedBody='' as NULL and reads the original body instead.

Disabling a binding entirely (gh#767)

Operator can pause a validator binding from the UI: AI Cockpit → Validators → PAUSE. The binding stops running until resumed.

Earlier (pre-gh#767) PAUSE/DELETE/EDIT threw raw DB 23505 errors on platform-shared validators. Fixed; per-tenant override now correctly suppresses the platform default.


Implements: gh#850 (per-binding + per-template threshold knob), gh#804 (per-finding confidence + threshold), gh#860 (non-issue findings excluded), gh#859 (empty thread-context guard), gh#767 (PAUSE platform-shared binding via tenant override).

Source: the FlatsBratislava operator manual.