`MultiAgentDebate`


from loomflow.architecture import MultiAgentDebate

N debaters argue, optional judge synthesizes. Du et al. 2023 , Improving Factuality and Reasoning in Language Models through Multiagent Debate. Liang et al. 2023 (divergent thinking via debate). Production patterns in AutoGen GroupChat, CAMEL.

For the conceptual page see Multi-Agent Debate.

Class signature


class MultiAgentDebate:
    name: str = "debate"
 
    def __init__(
        self,
        *,
        debaters: list[Agent],
        judge: Agent | None = None,
        rounds: int = 2,
        convergence_check: bool = True,
        convergence_similarity: float = 0.85,
        debater_instructions: str | None = None,
        judge_instructions: str | None = None,
    ) -> None: ...

Constructor parameters

`debaters`


Type	`list[Agent]`
Default	required

The debating agents. Must contain at least 2. Debate with one participant is just a single-agent run. Raises ValueError otherwise. For best results, use diverse models / prompts so blind-spot triangulation is real.

`judge`


Type	`Agent \| None`
Default	`None`

Optional synthesising agent. Reads the full debate transcript and produces the final answer. When None, the architecture falls back to majority vote on the final round’s answers.

`rounds`


Type	`int`
Default	`2`

Maximum debate rounds (after the independent round 0). Each round, all debaters run in parallel with full transcript context. Must be >= 1.

`convergence_check`


Type	`bool`
Default	`True`

When True (default), the architecture checks for early termination after each round: if all debaters’ answers exceed convergence_similarity similarity, the loop terminates. Set to False for adversarial-only debates where you want full rounds even if positions converge.

`convergence_similarity`


Type	`float`
Default	`0.85`

Jaccard similarity threshold in [0.0, 1.0] for the convergence check. 0.85 ≈ “essentially the same answer, possibly different wording”. The empirical sweet spot. 1.0 reproduces strict- equality behaviour. Lower values are more aggressive (cuts cost, risks premature exit).

`debater_instructions`


Type	`str \| None`
Default	`None` (uses built-in default)

Override the debaters’ role prompt. The default frames each round as either independent (round 0) or as defending/updating positions in light of the prior transcript.

`judge_instructions`


Type	`str \| None`
Default	`None` (uses built-in default)

Override the judge’s prompt. The default reads the full transcript and synthesises the final answer with brief justification.

Methods

`declared_workers`


def declared_workers(self) -> dict[str, Agent]:
    workers = {f"debater_{i}": d for i, d in enumerate(self._debaters)}
    if self._judge is not None:
        workers["judge"] = self._judge
    return workers

Each debater is keyed debater_0, debater_1, …; the judge is keyed judge when present.

`run`

Round 0 (independent). All debaters answer the original question simultaneously, with no awareness of each other. Run in parallel. Emits debate.round_started, debate.response per debater.
Rounds 1..rounds (debate). Each debater receives the original question + full transcript. They defend or update their position. All debaters in a round run in parallel.
Convergence check (when convergence_check=True). Computes pairwise Jaccard similarity between final-round answers. If all pairs exceed convergence_similarity → early-exit. Emit debate.converged.
Synthesise. Either judge.run(synthesis_prompt) or majority vote on the final round’s answers.

When debate pays off

High-stakes decisions. Pricing, hiring, security policy, architecture choices.
Adversarial review. Red-team / blue-team style argumentation.
Domain disagreement is informative. Different specialists with different priors should disagree, and the disagreement is the signal.

When NOT to use

Tasks with a clear right answer (math, code). Debate is wasteful.
Time-sensitive interactive use (chatbots). Too slow.
Cost-sensitive bulk processing. 3–5× Single-agent cost.

Example


from loomflow import Agent
from loomflow.architecture import MultiAgentDebate
from loomflow.team import Team
 
optimist  = Agent("Argue for the positive case.", model="claude-opus-4-7")
skeptic   = Agent("Argue for the cautious case.", model="claude-opus-4-7")
analyst   = Agent("Argue from the data.",         model="gpt-4o")
 
cio = Agent(
    "You are the CIO. Read the full debate transcript and pick the "
    "best decision with one paragraph of justification.",
    model="claude-opus-4-7",
)
 
team = Team.debate(
    debaters=[optimist, skeptic, analyst],
    judge=cio,
    rounds=2,
    convergence_similarity=0.85,
    model="claude-opus-4-7",
)
 
result = await team.run(
    "Should we adopt agent harnesses for our customer support stack?"
)

Source

loomflow/architecture/debate.py

Convergence ≠ correctness. Debaters may converge on a wrong answer if they share priors. Pair with diverse models and prompts so blind-spot triangulation is real. For best-of-N with a different-model critic, ActorCritic is cheaper.