Reading the verdict

A claim-check result is deliberately not one score. It’s three independent axes — how well-checked the claim is, how much was checkable, and how sure Shipmoor is about the intent itself — because a high-drift result built on weak evidence and a vague intent must never read like a confident failure.

Anatomy of the block

Intent: Add a Stripe webhook handler for failed payments (payment_intent.payment_failed)
  Source: manual:--intent (manual_string) · agreement: single source · Confidence: low

Claim check  VERIFIED  ·  maturity: verified  ·  coverage: 100%
  ✓ A handler is bound to the Stripe payment_intent.payment_failed event.
  Not yet checked:
    ∅ Failure-path handling
    ∅ Webhook signature verification

Library 0.1.0 · Policy 0.1.0

Intent: — the resolved goal text, masked (a secret pasted into the intent never survives here).
Source: — which inputs resolved it, whether they agreed, and the intent confidence. See Providing intent.
The badge — the maturity state as the loud headline word, plus coverage.
✓ / ✗ lines — one per probe that applied: ✓ satisfied, ✗ a disclosed gap.
Not yet checked (∅) — expectations Shipmoor recognizes for this intent but has no probe for yet. Honest silence, not a pass.
Library / Policy — the probe-library and policy versions that produced the result, part of the reproducibility fingerprint.

The three axes

Axis	Question it answers	Where it shows
Maturity	What kind of evidence stands behind this result?	The badge headline
Coverage	What fraction of applicable checks produced a definite answer?	The badge
Confidence	How sure are we about what the change was meant to do?	The `Source:` line

Maturity: the five states

State	What it means	Terminal cue
`verified`	Deterministic probes fired and were satisfied — the claim is earned on evidence.	green
`partial`	Some expectations were satisfied; others were unmet or couldn’t be checked.	yellow
`gap_disclosed`	A required expectation is openly unmet — an honest, located negative.	red
`unprobed`	No probe applied; there is no deterministic evidence either way.	dim/grey
`inferred`	Only an advisory opinion exists; it carries no deterministic weight.	dim/italic

The weak states are styled to look weak. partial is not “wrong” — it means some expectations were checked and some were not; read the per-expectation lines to see which. Only gap_disclosed can ever earn a block, and blocking is a separate opt-in feature — see Turning on the gate.

Two kinds of “not checked”

Not yet checked (∅) — expectations with no shipped probe yet. These do not lower coverage; there was nothing to run.
The ⚠ footer — probes that did apply but returned cannot_check (an unsupported language, say). These do lower coverage, and the footer aggregates the count and reasons:

Claim check  NOT CHECKED  ·  maturity: unprobed  ·  coverage: 0%
  ? A Kubernetes Deployment is present in the change. — no relevant files in this change
⚠ We could not check 3 of 3 expectations (no relevant files in this change). Coverage 0%.

So coverage: 100% next to two Not yet checked lines reads: everything I probe, I could check — and here’s what I don’t probe yet.

Useful flags

--explain — expand every expectation with per-probe detail: which fact matched, why a check was cannot_check, the judge’s rationale if one ran.
--quiet-intent — collapse the claim check to a single badge line, for busy CI logs.

Plan drift (from a session)

When you pass --session <transcript>, Shipmoor also compares the agent’s own plan against the diff it produced. That’s a separate question from the claim check: not “did the diff do what the developer asked,” but “did the agent do what it said it would.”

Three conservative probes report it: plan.drift.goal_substitution (the plan and the task share no concept), plan.drift.scope_creep (the diff implements the plan plus unrelated files), and plan.drift.partial_implementation (a planned step only partly realized). Each errs toward silence — a false plan-drift is reviewer noise.

Plan-drift findings land in the normal findings list with category: intent_integrity at severity info, and they never change the exit code — not through the structural gate, not through the claim-check gate. The agent’s plan is never the standard of judgment; the resolved intent is.

In JSON and SARIF

--json carries the claim check as change_results[] — additive, absent entirely on a no-intent scan:

{
  "verdict": "major_gap",
  "maturity": "gap_disclosed",
  "coverage": 1.0,
  "gate_decision": "not_evaluated",
  "resolved_intent": { "goal_text": "…", "confidence": "medium" },
  "evidence": [ { "result": "unsatisfied", "basis": "deterministic" } ],
  "per_probe_summary": { "satisfied": 3, "unsatisfied": 0, "cannot_check": 0, "unmatched": 1 },
  "fingerprint": "sha256:…"
}

gate_decision is not_evaluated (advisory), passed, would_block, or blocked. unmatched counts probes that were considered but didn’t apply to this change — not an error. --sarif emits SARIF 2.1.0; plan-drift findings appear in the regular findings[] with category: intent_integrity.

Turning on the gate — when a gap_disclosed verdict should block.
BYO-Judge — where inferred results come from.
Providing intent — raising confidence with agreeing sources.

Reading the verdict

Anatomy of the block

The three axes

Maturity: the five states

Two kinds of “not checked”

Useful flags

Plan drift (from a session)

In JSON and SARIF

Next

Was this article helpful?

Related Articles

BYO-Judge (LLM second opinion)

Turning on the gate

What is Claim Check

Privacy & telemetry

Providing intent