Tree of Thoughts

Branching exploration with per-node evaluation. Yao et al. 2023 , Tree of Thoughts: Deliberate Problem Solving with Large Language Models.

Useful for combinatorial reasoning, multi-step planning, math (Game of 24), puzzle solving. Anywhere a single straight-shot ReAct trajectory would commit too early.

Pattern (BFS beam search)


              proposer (×branch_factor)         evaluator
   prompt ──► [t1, t2, t3]  ──score──►  [0.9, 0.4, 0.7]
                                              │
                                         keep top beam_width
                                         drop below min_score
                                              │
                                              ▼
                                         [t1, t3]   ←── frontier for depth 2
                                              │
                                         (repeat to max_depth)
                                              │
                                              ▼
                                       best leaf wins

Root is the problem statement.
For each level up to max_depth: a. Expand. For every frontier node, the proposer generates branch_factor candidate “thoughts” (next steps toward a solution). b. Evaluate. The evaluator scores each candidate 0–1 (how promising is this branch?). c. Prune. Keep only the top beam_width scored candidates. d. Early exit. If any candidate scores >= solved_threshold, stop early and use that branch.
Best leaf wins. The highest-scoring leaf across the whole tree becomes the final answer.

Usage


from loomflow import Agent
from loomflow.architecture import TreeOfThoughts
 
agent = Agent(
    "Solve the puzzle by exploring multiple lines of reasoning.",
    model="claude-opus-4-7",
    architecture=TreeOfThoughts(
        branch_factor=3,        # candidates per node
        beam_width=2,           # keep top-2 each depth
        max_depth=4,
        solved_threshold=0.95,  # early-exit if any candidate hits this
        min_score=0.3,          # drop branches below this
    ),
)
result = await agent.run("Game of 24 with [3, 7, 8, 1].")

Or by string:


agent = Agent("...", model="...", architecture="tree-of-thoughts")

Cost

LLM calls per depth: roughly frontier_size × (branch_factor + 1) . Propose + evaluate.
Total: at full depth, O(branch_factor^depth) proposals before pruning, then beam_width × max_depth after.
Real numbers: branch=3, beam=2, depth=4 → ~30 LLM calls vs ~5 for ReAct. Reserve for tasks where ReAct’s first commit is wrong.

When ToT pays off

Math, puzzles, planning. Game of 24, crosswords, mini Sudoku, constraint satisfaction.
Combinatorial generation. Story branches, counterfactual exploration.
Backtracking-friendly problems. Situations where committing to the wrong first move is hard to recover from in ReAct.

Not for general task automation. ToT is 5–10× ReAct cost. Don’t default to it. Reach for it when ReAct’s straight-line trajectory commits too early and produces wrong answers in your domain.