MultiAgentDebate
from loomflow.architecture import MultiAgentDebateN debaters argue, optional judge synthesizes. Du et al. 2023 , Improving Factuality and Reasoning in Language Models through Multiagent Debate. Liang et al. 2023 (divergent thinking via debate). Production patterns in AutoGen GroupChat, CAMEL.
For the conceptual page see Multi-Agent Debate.
Class signature
class MultiAgentDebate:
name: str = "debate"
def __init__(
self,
*,
debaters: list[Agent],
judge: Agent | None = None,
rounds: int = 2,
convergence_check: bool = True,
convergence_similarity: float = 0.85,
debater_instructions: str | None = None,
judge_instructions: str | None = None,
) -> None: ...Constructor parameters
debaters
| Type | list[Agent] |
| Default | required |
The debating agents. Must contain at least 2. Debate with one
participant is just a single-agent run. Raises ValueError otherwise.
For best results, use diverse models / prompts so blind-spot
triangulation is real.
judge
| Type | Agent | None |
| Default | None |
Optional synthesising agent. Reads the full debate transcript and
produces the final answer. When None, the architecture falls back
to majority vote on the final round’s answers.
rounds
| Type | int |
| Default | 2 |
Maximum debate rounds (after the independent round 0). Each round,
all debaters run in parallel with full transcript context. Must be
>= 1.
convergence_check
| Type | bool |
| Default | True |
When True (default), the architecture checks for early termination
after each round: if all debaters’ answers exceed convergence_similarity
similarity, the loop terminates. Set to False for adversarial-only
debates where you want full rounds even if positions converge.
convergence_similarity
| Type | float |
| Default | 0.85 |
Jaccard similarity threshold in [0.0, 1.0] for the convergence
check. 0.85 ≈ “essentially the same answer, possibly different
wording”. The empirical sweet spot. 1.0 reproduces strict-
equality behaviour. Lower values are more aggressive (cuts cost,
risks premature exit).
debater_instructions
| Type | str | None |
| Default | None (uses built-in default) |
Override the debaters’ role prompt. The default frames each round as either independent (round 0) or as defending/updating positions in light of the prior transcript.
judge_instructions
| Type | str | None |
| Default | None (uses built-in default) |
Override the judge’s prompt. The default reads the full transcript and synthesises the final answer with brief justification.
Methods
declared_workers
def declared_workers(self) -> dict[str, Agent]:
workers = {f"debater_{i}": d for i, d in enumerate(self._debaters)}
if self._judge is not None:
workers["judge"] = self._judge
return workersEach debater is keyed debater_0, debater_1, …; the judge is
keyed judge when present.
run
- Round 0 (independent). All debaters answer the original
question simultaneously, with no awareness of each other. Run
in parallel. Emits
debate.round_started,debate.responseper debater. - Rounds 1..rounds (debate). Each debater receives the original question + full transcript. They defend or update their position. All debaters in a round run in parallel.
- Convergence check (when
convergence_check=True). Computes pairwise Jaccard similarity between final-round answers. If all pairs exceedconvergence_similarity→ early-exit. Emitdebate.converged. - Synthesise. Either
judge.run(synthesis_prompt)or majority vote on the final round’s answers.
When debate pays off
- High-stakes decisions. Pricing, hiring, security policy, architecture choices.
- Adversarial review. Red-team / blue-team style argumentation.
- Domain disagreement is informative. Different specialists with different priors should disagree, and the disagreement is the signal.
When NOT to use
- Tasks with a clear right answer (math, code). Debate is wasteful.
- Time-sensitive interactive use (chatbots). Too slow.
- Cost-sensitive bulk processing. 3–5× Single-agent cost.
Example
from loomflow import Agent
from loomflow.architecture import MultiAgentDebate
from loomflow.team import Team
optimist = Agent("Argue for the positive case.", model="claude-opus-4-7")
skeptic = Agent("Argue for the cautious case.", model="claude-opus-4-7")
analyst = Agent("Argue from the data.", model="gpt-4o")
cio = Agent(
"You are the CIO. Read the full debate transcript and pick the "
"best decision with one paragraph of justification.",
model="claude-opus-4-7",
)
team = Team.debate(
debaters=[optimist, skeptic, analyst],
judge=cio,
rounds=2,
convergence_similarity=0.85,
model="claude-opus-4-7",
)
result = await team.run(
"Should we adopt agent harnesses for our customer support stack?"
)Source
loomflow/architecture/debate.py
Convergence ≠ correctness. Debaters may converge on a wrong
answer if they share priors. Pair with diverse models and prompts
so blind-spot triangulation is real. For best-of-N with a
different-model critic,
ActorCritic is cheaper.