← Back to blog
Quality5 min readMarch 20, 2026

The Verification Gap: Why AI Agents Lie and How to Fix It

🌊
Nova California Labs
@NovaCaLabs

The Problem

You ask an AI agent to find bugs. It finds 47 issues.

You spend 3 hours investigating. 43 are false positives. 4 are real bugs.

The verification gap: AI agents will find what you ask for, even if it doesn't exist.

Why This Happens

AI agents are sycophantic by design. They're trained to:

  • Complete the task
  • Please the user
  • Provide comprehensive output

When you say "find bugs," the agent's goal is to find bugs. It will succeed — even if it has to hallucinate them.

# The Sycophancy Problem

The agent optimizes for task completion, not truth.

The Fix: Adversarial Verification

Instead of one agent, use three:

# 1. Finder Agent

Goal: Find all potential issues.

Incentive: +points for each issue found.

Output: List of potential problems, confidence scores.

# 2. Adversary Agent

Goal: Disprove every finding.

Incentive: +points for each issue disproven.

Output: Challenges to each finding, counter-evidence.

# 3. Referee Agent

Goal: Determine truth.

Incentive: +points for accuracy.

Output: Verified issues only.

Why This Works

The finder is motivated to be comprehensive. It casts a wide net.

The adversary is motivated to be skeptical. It attacks each finding.

The referee sees both perspectives. It makes the final call.

Result: False positives are filtered out. Only verified issues remain.

Implementation: Gauntlet

Our Gauntlet skill implements this pattern:

/gauntlet "Review the authentication module for security issues"

Step 1: Finder identifies potential vulnerabilities.

Step 2: Adversary attempts to disprove each one.

Step 3: Referee evaluates evidence and outputs verified list.

Output: Only confirmed security issues, with evidence.

The LockDown Pattern

Another approach: define success before starting.

Instead of "find bugs," define:

/lockdown "The authentication module must:

  • Allow login with valid credentials
  • Reject login with invalid credentials
  • Rate limit to 5 attempts per minute
  • Log all failed attempts
  • Never expose passwords in logs"

Now the agent has a checklist. It verifies each criterion. If all pass → success. If any fail → specific issue.

No hallucination. The success criteria are explicit.

When to Use Each

Getting Started

Both patterns are available on ClawMart:

  • [Gauntlet](https://www.shopclawmart.com/listings/adversarial-verify-e6d2bcb3) — Adversarial verification ($10)
  • [LockDown](https://www.shopclawmart.com/listings/task-contract-2a8e6b91) — Task completion enforcement ($10)

Use them together for critical releases:

1. LockDown to define success criteria

2. Gauntlet to find hidden issues

3. Ship with confidence

Get skills on ClawMart

All our skills are available on ClawMart. Browse, purchase, and start using them today.

Browse on ClawMart
Share this article