Reflexion
from loomflow.architecture import ReflexionVerbal reinforcement learning via memory. Shinn et al. 2023 , Reflexion: Language Agents with Verbal Reinforcement Learning. After each attempt, an evaluator scores the output. Below threshold, a reflector produces a single-sentence “lesson”. Written advice the agent can read on its next attempt.
For the conceptual page see Reflexion.
Class signature
class Reflexion:
name: str = "reflexion"
def __init__(
self,
*,
base: Architecture | None = None,
max_attempts: int = 3,
threshold: float = 0.8,
evaluator_prompt: str | None = None,
reflector_prompt: str | None = None,
lessons_block_name: str = "reflexion_lessons",
lesson_store: VectorStore | None = None,
top_k_lessons: int = 5,
) -> None: ...Constructor parameters
base
| Type | Architecture | None |
| Default | None (resolves to ReAct()) |
The architecture used for each attempt. Defaults to ReAct.
Reflexion-of-Supervisor, Reflexion(base=Supervisor(workers=...))
, is a team that learns delegation patterns across attempts. See
Recursive composition.
max_attempts
| Type | int |
| Default | 3 |
Maximum attempts before giving up. Each attempt runs the base
architecture; below-threshold attempts trigger a lesson. Must be
>= 1.
threshold
| Type | float |
| Default | 0.8 |
Evaluator score (in [0.0, 1.0]) at or above which the loop
terminates. Lower for permissive evaluation, higher for strict.
evaluator_prompt
| Type | str | None |
| Default | None (uses built-in default) |
Override the evaluator’s system prompt. The default asks for a
numeric score in [0, 1] plus a brief justification. Provide your
own when you have a domain-specific rubric (test-pass rate, factual
accuracy against ground truth, etc.).
reflector_prompt
| Type | str | None |
| Default | None (uses built-in default) |
Override the reflector’s system prompt. The default asks for a single-sentence lesson the agent can apply on its next attempt. Lessons are deliberately short. Long lessons defeat the context-economy advantage over plain retry.
lessons_block_name
| Type | str |
| Default | "reflexion_lessons" |
Name of the working-memory block lessons are written to (in
monotonic block mode. When no lesson_store is wired). The
agent’s seed messages include this block on every subsequent
attempt.
lesson_store
| Type | VectorStore | None |
| Default | None |
When provided, switches Reflexion to selective recall mode: lessons are stored as embedded chunks in the vector store, and before each attempt the top-k most relevant lessons for the current prompt are retrieved and surfaced. Avoids context bloat as lessons accumulate.
from loomflow import HashEmbedder
from loomflow.architecture import Reflexion
from loomflow.vectorstore import InMemoryVectorStore
architecture = Reflexion(
lesson_store=InMemoryVectorStore(embedder=HashEmbedder()),
top_k_lessons=5,
)For cross-process learning, swap InMemoryVectorStore for
PostgresVectorStore or any other persistent backend.
top_k_lessons
| Type | int |
| Default | 5 |
Number of relevant lessons surfaced per attempt when lesson_store
is wired. Must be >= 1. Ignored in monotonic-block mode (where all
lessons are surfaced).
Methods
declared_workers
Returns {}. The base architecture’s own workers are reflected
through its own declared_workers (e.g. Reflexion(base=Supervisor(...))
exposes the supervisor’s workers via the supervisor’s interface).
run
For each attempt:
- Emit
reflexion.attempt_started. - Selective recall (when
lesson_storeis wired): query for lessons relevant to THIS prompt and write the top-k into the working memory block. Emitreflexion.lessons_recalled. - Reset session messages so the base re-runs seed_context,
which picks up lessons from
memory.working()automatically. - Run
base.run(session, deps, prompt). Forward all events. - If interrupted by base, terminate.
- Evaluate. One model call. If score ≥ threshold → terminate.
- Reflect. One model call producing a lesson. Persist to
lesson_store(selective mode) or append to the monotonic block.
Cost model
Each attempt: base + 1 (evaluator) + 1 (reflector if below threshold).
For a 3-attempt run: roughly 1.5–3× base cost depending on how often
the evaluator passes. Typical use case (test pass/fail evaluator):
1.5× cost when passing on attempt 2, 3× when needing all 3 attempts.
When Reflexion pays off
- Tasks with a clear evaluator signal (test pass/fail, JSON schema match, factual correctness against ground truth).
- Repeated runs of the same problem class. Lessons compound.
- The alternative is shipping wrong answers to users. Pays back the extra calls.
Example
from loomflow import Agent, HashEmbedder
from loomflow.architecture import Reflexion, Supervisor
from loomflow.vectorstore import InMemoryVectorStore
agent = Agent(
"Manage the article pipeline.",
model="claude-opus-4-7",
architecture=Reflexion(
base=Supervisor(workers={
"researcher": researcher,
"writer": writer,
"reviewer": reviewer,
}),
max_attempts=3,
threshold=0.85,
lesson_store=InMemoryVectorStore(embedder=HashEmbedder()),
),
)Source
loomflow/architecture/reflexion.py
Without an evaluator, Reflexion does nothing useful. The lesson
generator only fires when the evaluator’s score < threshold. Provide
an evaluator_prompt that returns a meaningful score for your
domain (test runner, regex match, structured-output validity, etc.)
or the loop just retries without learning.