Skip to Content

MultiAgentDebate

from loomflow.architecture import MultiAgentDebate

N debaters argue, optional judge synthesizes. Du et al. 2023 , Improving Factuality and Reasoning in Language Models through Multiagent Debate. Liang et al. 2023 (divergent thinking via debate). Production patterns in AutoGen GroupChat, CAMEL.

For the conceptual page see Multi-Agent Debate.


Class signature

class MultiAgentDebate: name: str = "debate" def __init__( self, *, debaters: list[Agent], judge: Agent | None = None, rounds: int = 2, convergence_check: bool = True, convergence_similarity: float = 0.85, debater_instructions: str | None = None, judge_instructions: str | None = None, ) -> None: ...

Constructor parameters

debaters

Typelist[Agent]
Defaultrequired

The debating agents. Must contain at least 2. Debate with one participant is just a single-agent run. Raises ValueError otherwise. For best results, use diverse models / prompts so blind-spot triangulation is real.

judge

TypeAgent | None
DefaultNone

Optional synthesising agent. Reads the full debate transcript and produces the final answer. When None, the architecture falls back to majority vote on the final round’s answers.

rounds

Typeint
Default2

Maximum debate rounds (after the independent round 0). Each round, all debaters run in parallel with full transcript context. Must be >= 1.

convergence_check

Typebool
DefaultTrue

When True (default), the architecture checks for early termination after each round: if all debaters’ answers exceed convergence_similarity similarity, the loop terminates. Set to False for adversarial-only debates where you want full rounds even if positions converge.

convergence_similarity

Typefloat
Default0.85

Jaccard similarity threshold in [0.0, 1.0] for the convergence check. 0.85 ≈ “essentially the same answer, possibly different wording”. The empirical sweet spot. 1.0 reproduces strict- equality behaviour. Lower values are more aggressive (cuts cost, risks premature exit).

debater_instructions

Typestr | None
DefaultNone (uses built-in default)

Override the debaters’ role prompt. The default frames each round as either independent (round 0) or as defending/updating positions in light of the prior transcript.

judge_instructions

Typestr | None
DefaultNone (uses built-in default)

Override the judge’s prompt. The default reads the full transcript and synthesises the final answer with brief justification.


Methods

declared_workers

def declared_workers(self) -> dict[str, Agent]: workers = {f"debater_{i}": d for i, d in enumerate(self._debaters)} if self._judge is not None: workers["judge"] = self._judge return workers

Each debater is keyed debater_0, debater_1, …; the judge is keyed judge when present.

run

  1. Round 0 (independent). All debaters answer the original question simultaneously, with no awareness of each other. Run in parallel. Emits debate.round_started, debate.response per debater.
  2. Rounds 1..rounds (debate). Each debater receives the original question + full transcript. They defend or update their position. All debaters in a round run in parallel.
  3. Convergence check (when convergence_check=True). Computes pairwise Jaccard similarity between final-round answers. If all pairs exceed convergence_similarity → early-exit. Emit debate.converged.
  4. Synthesise. Either judge.run(synthesis_prompt) or majority vote on the final round’s answers.

When debate pays off

  • High-stakes decisions. Pricing, hiring, security policy, architecture choices.
  • Adversarial review. Red-team / blue-team style argumentation.
  • Domain disagreement is informative. Different specialists with different priors should disagree, and the disagreement is the signal.

When NOT to use

  • Tasks with a clear right answer (math, code). Debate is wasteful.
  • Time-sensitive interactive use (chatbots). Too slow.
  • Cost-sensitive bulk processing. 3–5× Single-agent cost.

Example

from loomflow import Agent from loomflow.architecture import MultiAgentDebate from loomflow.team import Team optimist = Agent("Argue for the positive case.", model="claude-opus-4-7") skeptic = Agent("Argue for the cautious case.", model="claude-opus-4-7") analyst = Agent("Argue from the data.", model="gpt-4o") cio = Agent( "You are the CIO. Read the full debate transcript and pick the " "best decision with one paragraph of justification.", model="claude-opus-4-7", ) team = Team.debate( debaters=[optimist, skeptic, analyst], judge=cio, rounds=2, convergence_similarity=0.85, model="claude-opus-4-7", ) result = await team.run( "Should we adopt agent harnesses for our customer support stack?" )

Source

loomflow/architecture/debate.py

Convergence ≠ correctness. Debaters may converge on a wrong answer if they share priors. Pair with diverse models and prompts so blind-spot triangulation is real. For best-of-N with a different-model critic, ActorCritic is cheaper.

Last updated on