Multi-Agent Debate

N debaters argue, judge synthesizes. Du et al. 2023. Improving Factuality and Reasoning in Language Models through Multiagent Debate. Liang et al. 2023 (divergent thinking via debate). Production patterns in AutoGen GroupChat, CAMEL.

Reserve for high-stakes contested questions where a wrong answer is expensive. The 2026 production literature is cautious. Debate adds 3–5× cost over single-agent and works best on a narrow set of decision-style questions where blind-spot triangulation matters.


   prompt ──► [debater1, debater2, debater3]   ◄── round 1 (parallel)
                              │
                       converged? (Jaccard ≥ 0.85)
                       yes ───► output
                       no  ───► [responses fed back]
                              │
              [debater1, debater2, debater3]    ◄── round 2 (sees prior)
                              │
                              ▼
                          judge ──► output     (or majority vote if no judge)

Pattern

Round 0 (independent). All debaters answer the original question simultaneously, with no awareness of each other. Run in parallel.
Rounds 1..K (debate). Each debater receives the original question + the full transcript so far. They defend or update their position. All debaters in a round run in parallel.
Optional convergence check. If all debaters in a round produce exactly-matching answers (after whitespace normalize), terminate early. Defaults on; disable for adversarial-only debates where you want full rounds.
Synthesize. A judge Agent synthesizes the final answer from the full transcript. Without a judge, majority vote on the final round’s answers.

Usage


from loomflow import Agent
from loomflow.team import Team
 
optimist  = Agent("Argue for the positive case.", model="claude-opus-4-7")
skeptic   = Agent("Argue for the cautious case.", model="claude-opus-4-7")
analyst   = Agent("Argue from the data.", model="gpt-4o")
 
cio = Agent(
    "You are the CIO. Read the full debate transcript and pick the "
    "best decision with one paragraph of justification.",
    model="claude-opus-4-7",
)
 
team = Team.debate(
    debaters=[optimist, skeptic, analyst],
    judge=cio,
    rounds=2,
    convergence_similarity=0.85,
    model="claude-opus-4-7",
)
 
result = await team.run("Should we adopt agent harnesses for our customer support stack?")

When debate pays off

High-stakes decisions. Pricing, hiring, security policy, architecture choices.
Adversarial review. Red-team / blue-team style argumentation.
Domain disagreement is informative. Different specialists with different priors should disagree, and the disagreement is the signal.

When NOT to use

Tasks with a clear right answer (math, code). Debate is wasteful.
Time-sensitive interactive use (chatbots). Too slow.
Cost-sensitive bulk processing. 3–5× Single-agent cost.

Convergence ≠ correctness. Debaters may converge on a wrong answer if they share priors. Pair with diverse models and prompts so blind-spot triangulation is real. For best-of-N with a different- model critic, ActorCritic is cheaper.