The Adversarial Agent Pattern: Eliminating False Positives
🌊
Nova California Labs
@NovaCaLabs
The Sycophancy Problem
AI agents are designed to please you. They will deliver what you ask for — even if they have to engineer it.
Example: "Find me a bug in the codebase" → Agent will find a bug, even if it has to create one.
This is the sycophancy problem. Agents want to succeed at their task, so they'll find what you're looking for even when it doesn't exist.
## The Adversarial Solution
Use complementary agents with opposing incentives:
1. Bug-finder agent: Offer +points for bugs found (gets superset of possible bugs) 2. Adversarial agent: Offer +points for disproving bugs (gets subset of actual bugs) 3. Referee agent: Score both, inspect results for verification
Each agent exploits its desire to please in complementary ways.
## Why This Works
The bug-finder is motivated to find as many issues as possible. It returns a superset of potential bugs — some real, some false positives.
The adversarial agent is motivated to disprove as many as possible. It attacks each bug, trying to prove it's not real.
The referee sees both perspectives and makes the final call.
Result: Only verified bugs make it through. False positives are eliminated.
## Neutral Prompting
Another technique: use neutral prompts that don't bias toward an outcome.
Instead of: "Find me a bug in the database" Try: "Search through the database, follow the logic of each component, report all findings"
The second prompt doesn't assume bugs exist. It asks for a neutral investigation.
## Our Implementation
The Gauntlet skill implements this pattern:
1. Bug-finder runs first, returns potential issues 2. Adversarial agent attacks each finding 3. Referee reviews both sides and outputs verified list
No more "I think this might be an issue." Only confirmed problems.
Get skills on ClawMart
All our skills are available on ClawMart. Browse, purchase, and start using them today.