What is Claim Check

Shipmoor Team
June 11, 2026
3 min read

A coding agent confidently produces a diff. It lints clean, it type-checks, the tests it wrote pass — and it did the wrong thing: wired the wrong event, edited a sibling service, or “handled” a case by swallowing it. The reviewer’s scarcest question, the one no linter answers, is: did this change do what the task asked?

Claim Check answers it before a human reviews the diff. You give Shipmoor the change’s intent — a ticket, the agent’s prompt, the session, or a one-line goal — and it compares that intent to what the diff structurally did. In plain language, it reports what the change claims to do, what it verified, where it found a gap, and what it can’t check yet.

Claim Check is part of the Shipmoor IC plan; it needs the intent_scan entitlement. A plain structural scan works without it.

Deterministic decides; an LLM only advises

The market is filling with “AI reviews your AI” — a model judging a diff and gating your merge on its opinion. Shipmoor’s promise is the opposite, and it’s enforced in the engine, not just the marketing:

A model can never block your merge. Only deterministic, falsifiable evidence can — and only if you opt in.

  • Deterministic probes decide. A library of checks (“a handler is bound to payment_intent.payment_failed”, “the change touched the service the intent named”) runs against the structural facts of the diff. Each returns satisfied, unsatisfied, or cannot_check. Only a deterministic unsatisfied can ever earn a block.
  • An LLM only advises, in your own agent. For the long tail with no probe, Shipmoor can ask your own coding agent for an opinion (BYO-Judge). That opinion is labeled as inferred, excluded from the score, and structurally unable to gate — and Shipmoor hosts and calls no model.

What it does — and doesn’t — assert

Claim Check catches changes that didn’t earn their claim and says “couldn’t check” where it has no probe. It does not assert that a change definitively succeeded; honest silence is a feature, not a failure. That restraint is what makes the checks it does make worth trusting. See Reading the verdict.

Next

Last updated on June 11, 2026

Was this article helpful?

Your response is saved on this device.