`TreeOfThoughts`


from loomflow.architecture import TreeOfThoughts, ThoughtNode

Branching exploration with per-node evaluation. Yao et al. 2023 , Tree of Thoughts: Deliberate Problem Solving with Large Language Models.

For the conceptual page see Tree of Thoughts.

Class signature


class TreeOfThoughts:
    name: str = "tree-of-thoughts"
 
    def __init__(
        self,
        *,
        branch_factor: int = 3,
        max_depth: int = 3,
        beam_width: int = 2,
        solved_threshold: float = 1.0,
        min_score: float = 0.0,
        parallel: bool = True,
        proposer_prompt: str | None = None,
        evaluator_prompt: str | None = None,
    ) -> None: ...

Constructor parameters

`branch_factor`


Type	`int`
Default	`3`

Candidate “thoughts” the proposer generates per frontier node, per depth. Higher = wider search; expensive. Must be >= 1.

`max_depth`


Type	`int`
Default	`3`

Maximum depth of the search tree. Root is depth 0; the tree grows to max_depth levels. Must be >= 1. The “best leaf wins” output selection picks the highest-scoring node at any depth.

`beam_width`


Type	`int`
Default	`2`

Top-k candidates kept per depth. Higher = wider beam; more LLM calls. Must be >= 1. With branch_factor=3, beam_width=2, max_depth=3, expect ~2 × (3 + 1) × 3 = 24 model calls before pruning + early-exit.

`solved_threshold`


Type	`float`
Default	`1.0`

Score in [0.0, 1.0] at or above which a candidate triggers early exit. The matching candidate becomes the final answer. Default 1.0 disables early-exit (only the proposer’s max-confidence “this is the solution” matches).

`min_score`


Type	`float`
Default	`0.0`

Score floor in [0.0, 1.0] below which a candidate is dropped regardless of beam capacity. Lets bad branches die quickly instead of riding along just because the beam has room. 0.0 reproduces legacy behavior (no floor); typical production values are 0.2–0.4.

`parallel`


Type	`bool`
Default	`True`

When True, proposer + evaluator calls within a level run concurrently via anyio.create_task_group. Pure speedup , branch_factor × beam_width independent calls become wall-clock parallel instead of sequential. Set to False for deterministic test ordering or when your provider has tight rate limits.

`proposer_prompt`


Type	`str \| None`
Default	`None` (uses built-in default)

Override the proposer’s system prompt. The default asks for branch_factor candidate next thoughts toward solving the problem.

`evaluator_prompt`


Type	`str \| None`
Default	`None` (uses built-in default)

Override the evaluator’s system prompt. The default asks for a score in [0.0, 1.0] representing how promising the candidate branch is.

Methods

`declared_workers`

Returns {}. Single-agent.

`run`

Root. Depth-0 node containing the problem. No model call.
For each depth up to max_depth: a. Expand. For every frontier node, the proposer generates branch_factor candidates (parallel across nodes when parallel=True). b. Evaluate. The evaluator scores each candidate in [0.0, 1.0]. c. Prune. Drop candidates with score < min_score. Keep the top beam_width by score as the next frontier. d. Early exit. If any candidate scores >= solved_threshold, emit tot.solved and use that branch.
Best leaf wins. The highest-scoring node across the whole tree becomes the final answer.

Per-depth events: tot.depth_started, tot.expanded, tot.scored, tot.pruned, tot.frontier_updated. Optional final events: tot.solved or tot.best_leaf.

`ThoughtNode`


class ThoughtNode(BaseModel):
    id: str
    parent_id: str | None
    content: str
    score: float
    depth: int

Tree node. Stored on session.metadata["nodes"] after the run for introspection. The full tree is preserved (not just the surviving beam) so you can analyze pruning decisions.

Cost model

Per depth: frontier_size × (branch_factor + 1) model calls (proposer × branch_factor + evaluator × branch_factor, roughly batched depending on prompt design). Total across full depth: O(beam_width × max_depth × (branch_factor + 1)).

Real numbers for branch_factor=3, beam_width=2, max_depth=4: ~30 LLM calls vs ~5 for ReAct. Reserve for tasks where ReAct’s straight-line trajectory commits too early and produces wrong answers.

Example


from loomflow import Agent
from loomflow.architecture import TreeOfThoughts
 
agent = Agent(
    "Solve the puzzle by exploring multiple lines of reasoning.",
    model="claude-opus-4-7",
    architecture=TreeOfThoughts(
        branch_factor=3,
        beam_width=2,
        max_depth=4,
        solved_threshold=0.95,
        min_score=0.3,
    ),
)
result = await agent.run("Game of 24 with [3, 7, 8, 1].")

Or by string:


agent = Agent("...", model="...", architecture="tree-of-thoughts")

Source

loomflow/architecture/tree_of_thoughts.py

Not for general task automation. ToT is 5–10× ReAct cost. Don’t default to it. Reach for it when ReAct’s straight-line trajectory commits too early and produces wrong answers in your domain (puzzles, math, planning, constraint satisfaction).

TreeOfThoughts

Class signature

Constructor parameters

branch_factor

max_depth

beam_width

solved_threshold

min_score

parallel

proposer_prompt

evaluator_prompt