Skip to Content

Reflexion

from loomflow.architecture import Reflexion

Verbal reinforcement learning via memory. Shinn et al. 2023 , Reflexion: Language Agents with Verbal Reinforcement Learning. After each attempt, an evaluator scores the output. Below threshold, a reflector produces a single-sentence “lesson”. Written advice the agent can read on its next attempt.

For the conceptual page see Reflexion.


Class signature

class Reflexion: name: str = "reflexion" def __init__( self, *, base: Architecture | None = None, max_attempts: int = 3, threshold: float = 0.8, evaluator_prompt: str | None = None, reflector_prompt: str | None = None, lessons_block_name: str = "reflexion_lessons", lesson_store: VectorStore | None = None, top_k_lessons: int = 5, ) -> None: ...

Constructor parameters

base

TypeArchitecture | None
DefaultNone (resolves to ReAct())

The architecture used for each attempt. Defaults to ReAct. Reflexion-of-Supervisor, Reflexion(base=Supervisor(workers=...)) , is a team that learns delegation patterns across attempts. See Recursive composition.

max_attempts

Typeint
Default3

Maximum attempts before giving up. Each attempt runs the base architecture; below-threshold attempts trigger a lesson. Must be >= 1.

threshold

Typefloat
Default0.8

Evaluator score (in [0.0, 1.0]) at or above which the loop terminates. Lower for permissive evaluation, higher for strict.

evaluator_prompt

Typestr | None
DefaultNone (uses built-in default)

Override the evaluator’s system prompt. The default asks for a numeric score in [0, 1] plus a brief justification. Provide your own when you have a domain-specific rubric (test-pass rate, factual accuracy against ground truth, etc.).

reflector_prompt

Typestr | None
DefaultNone (uses built-in default)

Override the reflector’s system prompt. The default asks for a single-sentence lesson the agent can apply on its next attempt. Lessons are deliberately short. Long lessons defeat the context-economy advantage over plain retry.

lessons_block_name

Typestr
Default"reflexion_lessons"

Name of the working-memory block lessons are written to (in monotonic block mode. When no lesson_store is wired). The agent’s seed messages include this block on every subsequent attempt.

lesson_store

TypeVectorStore | None
DefaultNone

When provided, switches Reflexion to selective recall mode: lessons are stored as embedded chunks in the vector store, and before each attempt the top-k most relevant lessons for the current prompt are retrieved and surfaced. Avoids context bloat as lessons accumulate.

from loomflow import HashEmbedder from loomflow.architecture import Reflexion from loomflow.vectorstore import InMemoryVectorStore architecture = Reflexion( lesson_store=InMemoryVectorStore(embedder=HashEmbedder()), top_k_lessons=5, )

For cross-process learning, swap InMemoryVectorStore for PostgresVectorStore or any other persistent backend.

top_k_lessons

Typeint
Default5

Number of relevant lessons surfaced per attempt when lesson_store is wired. Must be >= 1. Ignored in monotonic-block mode (where all lessons are surfaced).


Methods

declared_workers

Returns {}. The base architecture’s own workers are reflected through its own declared_workers (e.g. Reflexion(base=Supervisor(...)) exposes the supervisor’s workers via the supervisor’s interface).

run

For each attempt:

  1. Emit reflexion.attempt_started.
  2. Selective recall (when lesson_store is wired): query for lessons relevant to THIS prompt and write the top-k into the working memory block. Emit reflexion.lessons_recalled.
  3. Reset session messages so the base re-runs seed_context, which picks up lessons from memory.working() automatically.
  4. Run base.run(session, deps, prompt). Forward all events.
  5. If interrupted by base, terminate.
  6. Evaluate. One model call. If score ≥ threshold → terminate.
  7. Reflect. One model call producing a lesson. Persist to lesson_store (selective mode) or append to the monotonic block.

Cost model

Each attempt: base + 1 (evaluator) + 1 (reflector if below threshold). For a 3-attempt run: roughly 1.5–3× base cost depending on how often the evaluator passes. Typical use case (test pass/fail evaluator): 1.5× cost when passing on attempt 2, 3× when needing all 3 attempts.


When Reflexion pays off

  • Tasks with a clear evaluator signal (test pass/fail, JSON schema match, factual correctness against ground truth).
  • Repeated runs of the same problem class. Lessons compound.
  • The alternative is shipping wrong answers to users. Pays back the extra calls.

Example

from loomflow import Agent, HashEmbedder from loomflow.architecture import Reflexion, Supervisor from loomflow.vectorstore import InMemoryVectorStore agent = Agent( "Manage the article pipeline.", model="claude-opus-4-7", architecture=Reflexion( base=Supervisor(workers={ "researcher": researcher, "writer": writer, "reviewer": reviewer, }), max_attempts=3, threshold=0.85, lesson_store=InMemoryVectorStore(embedder=HashEmbedder()), ), )

Source

loomflow/architecture/reflexion.py

Without an evaluator, Reflexion does nothing useful. The lesson generator only fires when the evaluator’s score < threshold. Provide an evaluator_prompt that returns a meaningful score for your domain (test runner, regex match, structured-output validity, etc.) or the loop just retries without learning.

Last updated on