`Reflexion`


from loomflow.architecture import Reflexion

Verbal reinforcement learning via memory. Shinn et al. 2023 , Reflexion: Language Agents with Verbal Reinforcement Learning. After each attempt, an evaluator scores the output. Below threshold, a reflector produces a single-sentence “lesson”. Written advice the agent can read on its next attempt.

For the conceptual page see Reflexion.

Class signature


class Reflexion:
    name: str = "reflexion"
 
    def __init__(
        self,
        *,
        base: Architecture | None = None,
        max_attempts: int = 3,
        threshold: float = 0.8,
        evaluator_prompt: str | None = None,
        reflector_prompt: str | None = None,
        lessons_block_name: str = "reflexion_lessons",
        lesson_store: VectorStore | None = None,
        top_k_lessons: int = 5,
    ) -> None: ...

Constructor parameters

`base`


Type	`Architecture \| None`
Default	`None` (resolves to `ReAct()`)

The architecture used for each attempt. Defaults to ReAct. Reflexion-of-Supervisor, Reflexion(base=Supervisor(workers=...)) , is a team that learns delegation patterns across attempts. See Recursive composition.

`max_attempts`


Type	`int`
Default	`3`

Maximum attempts before giving up. Each attempt runs the base architecture; below-threshold attempts trigger a lesson. Must be >= 1.

`threshold`


Type	`float`
Default	`0.8`

Evaluator score (in [0.0, 1.0]) at or above which the loop terminates. Lower for permissive evaluation, higher for strict.

`evaluator_prompt`


Type	`str \| None`
Default	`None` (uses built-in default)

Override the evaluator’s system prompt. The default asks for a numeric score in [0, 1] plus a brief justification. Provide your own when you have a domain-specific rubric (test-pass rate, factual accuracy against ground truth, etc.).

`reflector_prompt`


Type	`str \| None`
Default	`None` (uses built-in default)

Override the reflector’s system prompt. The default asks for a single-sentence lesson the agent can apply on its next attempt. Lessons are deliberately short. Long lessons defeat the context-economy advantage over plain retry.

`lessons_block_name`


Type	`str`
Default	`"reflexion_lessons"`

Name of the working-memory block lessons are written to (in monotonic block mode. When no lesson_store is wired). The agent’s seed messages include this block on every subsequent attempt.

`lesson_store`


Type	`VectorStore \| None`
Default	`None`

When provided, switches Reflexion to selective recall mode: lessons are stored as embedded chunks in the vector store, and before each attempt the top-k most relevant lessons for the current prompt are retrieved and surfaced. Avoids context bloat as lessons accumulate.


from loomflow import HashEmbedder
from loomflow.architecture import Reflexion
from loomflow.vectorstore import InMemoryVectorStore
 
architecture = Reflexion(
    lesson_store=InMemoryVectorStore(embedder=HashEmbedder()),
    top_k_lessons=5,
)

For cross-process learning, swap InMemoryVectorStore for PostgresVectorStore or any other persistent backend.

`top_k_lessons`


Type	`int`
Default	`5`

Number of relevant lessons surfaced per attempt when lesson_store is wired. Must be >= 1. Ignored in monotonic-block mode (where all lessons are surfaced).

Methods

`declared_workers`

Returns {}. The base architecture’s own workers are reflected through its own declared_workers (e.g. Reflexion(base=Supervisor(...)) exposes the supervisor’s workers via the supervisor’s interface).

`run`

For each attempt:

Emit reflexion.attempt_started.
Selective recall (when lesson_store is wired): query for lessons relevant to THIS prompt and write the top-k into the working memory block. Emit reflexion.lessons_recalled.
Reset session messages so the base re-runs seed_context, which picks up lessons from memory.working() automatically.
Run base.run(session, deps, prompt). Forward all events.
If interrupted by base, terminate.
Evaluate. One model call. If score ≥ threshold → terminate.
Reflect. One model call producing a lesson. Persist to lesson_store (selective mode) or append to the monotonic block.

Cost model

Each attempt: base + 1 (evaluator) + 1 (reflector if below threshold). For a 3-attempt run: roughly 1.5–3× base cost depending on how often the evaluator passes. Typical use case (test pass/fail evaluator): 1.5× cost when passing on attempt 2, 3× when needing all 3 attempts.

When Reflexion pays off

Tasks with a clear evaluator signal (test pass/fail, JSON schema match, factual correctness against ground truth).
Repeated runs of the same problem class. Lessons compound.
The alternative is shipping wrong answers to users. Pays back the extra calls.

Example


from loomflow import Agent, HashEmbedder
from loomflow.architecture import Reflexion, Supervisor
from loomflow.vectorstore import InMemoryVectorStore
 
agent = Agent(
    "Manage the article pipeline.",
    model="claude-opus-4-7",
    architecture=Reflexion(
        base=Supervisor(workers={
            "researcher": researcher,
            "writer":     writer,
            "reviewer":   reviewer,
        }),
        max_attempts=3,
        threshold=0.85,
        lesson_store=InMemoryVectorStore(embedder=HashEmbedder()),
    ),
)

Source

loomflow/architecture/reflexion.py

Without an evaluator, Reflexion does nothing useful. The lesson generator only fires when the evaluator’s score < threshold. Provide an evaluator_prompt that returns a meaningful score for your domain (test runner, regex match, structured-output validity, etc.) or the loop just retries without learning.