Reading the verdict

Shipmoor Team
June 11, 2026
5 min read

A claim-check result is deliberately not one score. It’s three independent axes — how well-checked the claim is, how much was checkable, and how sure Shipmoor is about the intent itself — because a high-drift result built on weak evidence and a vague intent must never read like a confident failure.

Anatomy of the block

Intent: Add a Stripe webhook handler for failed payments (payment_intent.payment_failed)
  Source: manual:--intent (manual_string) · agreement: single source · Confidence: low

Claim check  VERIFIED  ·  maturity: verified  ·  coverage: 100%
  ✓ A handler is bound to the Stripe payment_intent.payment_failed event.
  Not yet checked:
    ∅ Failure-path handling
    ∅ Webhook signature verification

Library 0.1.0 · Policy 0.1.0
  • Intent: — the resolved goal text, masked (a secret pasted into the intent never survives here).
  • Source: — which inputs resolved it, whether they agreed, and the intent confidence. See Providing intent.
  • The badge — the maturity state as the loud headline word, plus coverage.
  • ✓ / ✗ lines — one per probe that applied: satisfied, a disclosed gap.
  • Not yet checked (∅) — expectations Shipmoor recognizes for this intent but has no probe for yet. Honest silence, not a pass.
  • Library / Policy — the probe-library and policy versions that produced the result, part of the reproducibility fingerprint.

The three axes

AxisQuestion it answersWhere it shows
MaturityWhat kind of evidence stands behind this result?The badge headline
CoverageWhat fraction of applicable checks produced a definite answer?The badge
ConfidenceHow sure are we about what the change was meant to do?The Source: line

Maturity: the five states

StateWhat it meansTerminal cue
verifiedDeterministic probes fired and were satisfied — the claim is earned on evidence.green
partialSome expectations were satisfied; others were unmet or couldn’t be checked.yellow
gap_disclosedA required expectation is openly unmet — an honest, located negative.red
unprobedNo probe applied; there is no deterministic evidence either way.dim/grey
inferredOnly an advisory opinion exists; it carries no deterministic weight.dim/italic

The weak states are styled to look weak. partial is not “wrong” — it means some expectations were checked and some were not; read the per-expectation lines to see which. Only gap_disclosed can ever earn a block, and blocking is a separate opt-in feature — see Turning on the gate.

Two kinds of “not checked”

  • Not yet checked (∅) — expectations with no shipped probe yet. These do not lower coverage; there was nothing to run.
  • The ⚠ footer — probes that did apply but returned cannot_check (an unsupported language, say). These do lower coverage, and the footer aggregates the count and reasons:
Claim check  NOT CHECKED  ·  maturity: unprobed  ·  coverage: 0%
  ? A Kubernetes Deployment is present in the change. — no relevant files in this change
⚠ We could not check 3 of 3 expectations (no relevant files in this change). Coverage 0%.

So coverage: 100% next to two Not yet checked lines reads: everything I probe, I could check — and here’s what I don’t probe yet.

Useful flags

  • --explain — expand every expectation with per-probe detail: which fact matched, why a check was cannot_check, the judge’s rationale if one ran.
  • --quiet-intent — collapse the claim check to a single badge line, for busy CI logs.

Plan drift (from a session)

When you pass --session <transcript>, Shipmoor also compares the agent’s own plan against the diff it produced. That’s a separate question from the claim check: not “did the diff do what the developer asked,” but “did the agent do what it said it would.”

Three conservative probes report it: plan.drift.goal_substitution (the plan and the task share no concept), plan.drift.scope_creep (the diff implements the plan plus unrelated files), and plan.drift.partial_implementation (a planned step only partly realized). Each errs toward silence — a false plan-drift is reviewer noise.

Plan-drift findings land in the normal findings list with category: intent_integrity at severity info, and they never change the exit code — not through the structural gate, not through the claim-check gate. The agent’s plan is never the standard of judgment; the resolved intent is.

In JSON and SARIF

--json carries the claim check as change_results[] — additive, absent entirely on a no-intent scan:

{
  "verdict": "major_gap",
  "maturity": "gap_disclosed",
  "coverage": 1.0,
  "gate_decision": "not_evaluated",
  "resolved_intent": { "goal_text": "…", "confidence": "medium" },
  "evidence": [ { "result": "unsatisfied", "basis": "deterministic" } ],
  "per_probe_summary": { "satisfied": 3, "unsatisfied": 0, "cannot_check": 0, "unmatched": 1 },
  "fingerprint": "sha256:…"
}

gate_decision is not_evaluated (advisory), passed, would_block, or blocked. unmatched counts probes that were considered but didn’t apply to this change — not an error. --sarif emits SARIF 2.1.0; plan-drift findings appear in the regular findings[] with category: intent_integrity.

Next

Last updated on June 11, 2026

Was this article helpful?

Your response is saved on this device.