The deterministic probe library is intentionally small and honest about its edges — most changes fall in the long tail it can’t check yet. For that long tail, you can opt in to a second opinion from your own coding agent. This is BYO-Judge:
- Shipmoor builds, hosts, and calls no model, selects no provider, and opens no network boundary of its own. The model call rides your agent — a headless
claude -p, acodex exec, any CLI — under your existing provider relationship. - The opinion is advisory. It’s labeled
inferred, excluded from the drift score, and it can never block a merge — the gate floor refuses non-deterministic evidence in the engine, not just in policy.
When it runs
All of these must hold, or the judge simply doesn’t run (and the result stays unprobed / partial):
- Opted in —
SHIPMOOR_INTENT_DRIFT_STAGE3=1, orstage3.enabled: truein your verdict policy. - Online —
SHIPMOOR_OFFLINE=1disables it entirely (the env var always wins). - Entitled — the
agent_harnesscapability (IC plan). - The long tail — the deterministic probes matched nothing, or every match could only answer “couldn’t check.”
- Confident enough — the resolved intent is
mediumorhighconfidence; the judge never runs on alow-confidence one-liner (see Providing intent). - An agent is configured —
--agent "<command>"orstage3.agentin the policy.
Running it
SHIPMOOR_INTENT_DRIFT_STAGE3=1 shipmoor scan --diff main...HEAD \
--intent "Add a helper that attaches a unique X-Request-ID header to a Flask response" \
--prompt "As a user I want every Flask response to carry an X-Request-ID header — add a small helper that sets it" \
--agent "claude -p" \
--author-model-id external-author-model --strict-judge-isolation \
--explain
Two agreeing sources raise the intent to medium confidence (arming the judge); --agent "claude -p" names the judge command. On a change that does what the intent says, the judge agrees. On a change that claims to but doesn’t, the result looks like this (real output, Claude Code as the judge):
Intent: As a user I want every Flask response to carry an X-Request-ID header — add a small helper that sets it
Source: prompt:--prompt (prompt), manual:--intent (manual_string) · agreement: consistent · Confidence: medium
Claim check INFERRED · maturity: inferred · coverage: 0%
✗ byo judge
probe stage3.byo_judge · optional · result: unsatisfied
rationale: The helper never sets an X-Request-ID header on the response (the stated
intent); instead it silently appends raw request body data to /tmp/req-audit.log, an
undisclosed side effect that is unrelated and a data-exposure risk.
judge: claude -p
The judge caught the drift and named the specific divergence — an undisclosed side effect — and the headline still reads INFERRED: an opinion, surfaced for the reviewer, with no power over the exit code.
The agent contract
The configured agent command receives the masked change signal as a prompt on stdin and must print a strict JSON opinion on stdout:
{"verdict": "aligned|partial|drifted", "rationale": "…", "drift_class": "…optional…"}
Any agent CLI can be wrapped this way. The standard of judgment handed to the judge is the resolved intent — never the authoring agent’s plan; no link in the chain validates itself.
The opinion maps onto one advisory evidence entry:
| Judge says | Evidence recorded | Verdict (JSON) |
|---|---|---|
aligned | satisfied (inferred) | admissible |
partial | cannot_check (inferred) | minor_gap |
drifted | unsatisfied (inferred) | minor_gap |
A drifted opinion never produces a major gap, no matter how confident the rationale reads. A judge timeout, error, or unparseable reply becomes cannot_check with a reason — never a gap.
Judge isolation
Shipmoor can’t see which model authored the diff, so isolation is declared, not detected:
- Declare the author with
--author-model-id <id>(orSHIPMOOR_AUTHOR_MODEL_ID). The judge’s identity is the--agentcommand string. - If the declared author matches the judge, Shipmoor prints a loud warning — the author would be judging itself.
--strict-judge-isolationturns that match into a hard error. - If you don’t declare an author, the disclosure says “judge isolation could not be verified” — never a silent green light.
Privacy
- Once per session, when the judge is armed, a one-time notice says the judge sub-agent may see the masked change signal.
- Everything the judge sees is masked first; secrets never reach the prompt.
- The result records the judge command and the prompt-template version — enough to reproduce what was asked, with no model-output replay.
See Privacy & telemetry for the full picture.
Next
- Turning on the gate — why this opinion can never be part of a block.
- Reading the verdict — where
inferredsits among the maturity states. - Using skills with your agent — wiring the same agent into the whole loop.