Skip to Content
DocsArchitecturesTree of Thoughts

Tree of Thoughts

Branching exploration with per-node evaluation. Yao et al. 2023 , Tree of Thoughts: Deliberate Problem Solving with Large Language Models.

Useful for combinatorial reasoning, multi-step planning, math (Game of 24), puzzle solving. Anywhere a single straight-shot ReAct trajectory would commit too early.

proposer (×branch_factor) evaluator prompt ──► [t1, t2, t3] ──score──► [0.9, 0.4, 0.7] keep top beam_width drop below min_score [t1, t3] ←── frontier for depth 2 (repeat to max_depth) best leaf wins
  1. Root is the problem statement.
  2. For each level up to max_depth: a. Expand. For every frontier node, the proposer generates branch_factor candidate “thoughts” (next steps toward a solution). b. Evaluate. The evaluator scores each candidate 0–1 (how promising is this branch?). c. Prune. Keep only the top beam_width scored candidates. d. Early exit. If any candidate scores >= solved_threshold, stop early and use that branch.
  3. Best leaf wins. The highest-scoring leaf across the whole tree becomes the final answer.

Usage

from loomflow import Agent from loomflow.architecture import TreeOfThoughts agent = Agent( "Solve the puzzle by exploring multiple lines of reasoning.", model="claude-opus-4-7", architecture=TreeOfThoughts( branch_factor=3, # candidates per node beam_width=2, # keep top-2 each depth max_depth=4, solved_threshold=0.95, # early-exit if any candidate hits this min_score=0.3, # drop branches below this ), ) result = await agent.run("Game of 24 with [3, 7, 8, 1].")

Or by string:

agent = Agent("...", model="...", architecture="tree-of-thoughts")

Cost

  • LLM calls per depth: roughly frontier_size × (branch_factor + 1) . Propose + evaluate.
  • Total: at full depth, O(branch_factor^depth) proposals before pruning, then beam_width × max_depth after.
  • Real numbers: branch=3, beam=2, depth=4 → ~30 LLM calls vs ~5 for ReAct. Reserve for tasks where ReAct’s first commit is wrong.

When ToT pays off

  • Math, puzzles, planning. Game of 24, crosswords, mini Sudoku, constraint satisfaction.
  • Combinatorial generation. Story branches, counterfactual exploration.
  • Backtracking-friendly problems. Situations where committing to the wrong first move is hard to recover from in ReAct.

Not for general task automation. ToT is 5–10× ReAct cost. Don’t default to it. Reach for it when ReAct’s straight-line trajectory commits too early and produces wrong answers in your domain.

Last updated on