BYO-Judge (LLM second opinion)

The deterministic probe library is intentionally small and honest about its edges — most changes fall in the long tail it can’t check yet. For that long tail, you can opt in to a second opinion from your own coding agent. This is BYO-Judge:

Shipmoor builds, hosts, and calls no model, selects no provider, and opens no network boundary of its own. The model call rides your agent — a headless claude -p, a codex exec, any CLI — under your existing provider relationship.
The opinion is advisory. It’s labeled inferred, excluded from the drift score, and it can never block a merge — the gate floor refuses non-deterministic evidence in the engine, not just in policy.

When it runs

All of these must hold, or the judge simply doesn’t run (and the result stays unprobed / partial):

Opted in — SHIPMOOR_INTENT_DRIFT_STAGE3=1, or stage3.enabled: true in your verdict policy.
Online — SHIPMOOR_OFFLINE=1 disables it entirely (the env var always wins).
Entitled — the agent_harness capability (IC plan).
The long tail — the deterministic probes matched nothing, or every match could only answer “couldn’t check.”
Confident enough — the resolved intent is medium or high confidence; the judge never runs on a low-confidence one-liner (see Providing intent).
An agent is configured — --agent "<command>" or stage3.agent in the policy.

Running it

SHIPMOOR_INTENT_DRIFT_STAGE3=1 shipmoor scan --diff main...HEAD \
  --intent "Add a helper that attaches a unique X-Request-ID header to a Flask response" \
  --prompt "As a user I want every Flask response to carry an X-Request-ID header — add a small helper that sets it" \
  --agent "claude -p" \
  --author-model-id external-author-model --strict-judge-isolation \
  --explain

Two agreeing sources raise the intent to medium confidence (arming the judge); --agent "claude -p" names the judge command. On a change that does what the intent says, the judge agrees. On a change that claims to but doesn’t, the result looks like this (real output, Claude Code as the judge):

Intent: As a user I want every Flask response to carry an X-Request-ID header — add a small helper that sets it
  Source: prompt:--prompt (prompt), manual:--intent (manual_string) · agreement: consistent · Confidence: medium

Claim check  INFERRED  ·  maturity: inferred  ·  coverage: 0%
  ✗ byo judge
      probe stage3.byo_judge · optional · result: unsatisfied
      rationale: The helper never sets an X-Request-ID header on the response (the stated
      intent); instead it silently appends raw request body data to /tmp/req-audit.log, an
      undisclosed side effect that is unrelated and a data-exposure risk.
      judge: claude -p

The judge caught the drift and named the specific divergence — an undisclosed side effect — and the headline still reads INFERRED: an opinion, surfaced for the reviewer, with no power over the exit code.

The agent contract

The configured agent command receives the masked change signal as a prompt on stdin and must print a strict JSON opinion on stdout:

{"verdict": "aligned|partial|drifted", "rationale": "…", "drift_class": "…optional…"}

Any agent CLI can be wrapped this way. The standard of judgment handed to the judge is the resolved intent — never the authoring agent’s plan; no link in the chain validates itself.

The opinion maps onto one advisory evidence entry:

Judge says	Evidence recorded	Verdict (JSON)
`aligned`	`satisfied (inferred)`	`admissible`
`partial`	`cannot_check (inferred)`	`minor_gap`
`drifted`	`unsatisfied (inferred)`	`minor_gap`

A drifted opinion never produces a major gap, no matter how confident the rationale reads. A judge timeout, error, or unparseable reply becomes cannot_check with a reason — never a gap.

Judge isolation

Shipmoor can’t see which model authored the diff, so isolation is declared, not detected:

Declare the author with --author-model-id <id> (or SHIPMOOR_AUTHOR_MODEL_ID). The judge’s identity is the --agent command string.
If the declared author matches the judge, Shipmoor prints a loud warning — the author would be judging itself. --strict-judge-isolation turns that match into a hard error.
If you don’t declare an author, the disclosure says “judge isolation could not be verified” — never a silent green light.

Privacy

Once per session, when the judge is armed, a one-time notice says the judge sub-agent may see the masked change signal.
Everything the judge sees is masked first; secrets never reach the prompt.
The result records the judge command and the prompt-template version — enough to reproduce what was asked, with no model-output replay.

See Privacy & telemetry for the full picture.

Turning on the gate — why this opinion can never be part of a block.
Reading the verdict — where inferred sits among the maturity states.
Using skills with your agent — wiring the same agent into the whole loop.

BYO-Judge (LLM second opinion)

When it runs

Running it

The agent contract

Judge isolation

Privacy

Next

Was this article helpful?

Related Articles

Turning on the gate

What is Claim Check

Privacy & telemetry

Providing intent

Claim Check quickstart