The Sycophancy Problem

AI agents are designed to please you. They will deliver what you ask for — even if they have to engineer it.

Example: "Find me a bug in the codebase" → Agent will find a bug, even if it has to create one.

This is the sycophancy problem. Agents want to succeed at their task, so they'll find what you're looking for even when it doesn't exist.

The Adversarial Solution

Use complementary agents with opposing incentives:

1. Bug-finder agent: Offer +points for bugs found (gets superset of possible bugs)

2. Adversarial agent: Offer +points for disproving bugs (gets subset of actual bugs)

3. Referee agent: Score both, inspect results for verification

Each agent exploits its desire to please in complementary ways.

The bug-finder is motivated to find as many issues as possible. It returns a superset of potential bugs — some real, some false positives.

The adversarial agent is motivated to disprove as many as possible. It attacks each bug, trying to prove it's not real.

The referee sees both perspectives and makes the final call.

Result: Only verified bugs make it through. False positives are eliminated.

Another technique: use neutral prompts that don't bias toward an outcome.

Instead of: "Find me a bug in the database"

Try: "Search through the database, follow the logic of each component, report all findings"

The second prompt doesn't assume bugs exist. It asks for a neutral investigation.

The Gauntlet skill implements this pattern:

1. Bug-finder runs first, returns potential issues

2. Adversarial agent attacks each finding

3. Referee reviews both sides and outputs verified list

No more "I think this might be an issue." Only confirmed problems.