Skip to Content

Reflexion

Verbal reinforcement learning via memory. Shinn et al. 2023 , Reflexion: Language Agents with Verbal Reinforcement Learning.

After each attempt, an evaluator scores the output. Below threshold, a reflector produces a single-sentence “lesson”. Written advice the agent can read on its next attempt.

┌─────────── attempt loop (max_attempts) ───────────┐ │ │ │ prompt ──► [recall lessons] ──► base.run() ──► evaluator │ │ │ score < threshold? │ │ │ yes ──┴── no ──► output │ │ │ reflector ──► lesson │ │ └────────────────────────────────── persist ─────────┘

Lesson storage modes

Two storage modes for the persisted lessons:

  • Monotonic block (legacy default). Every lesson is appended to memory.<lessons_block_name> and shown to the agent on every subsequent attempt. Simple but bloats context as lessons accumulate.
  • Selective recall (recommended). Pass lesson_store= a VectorStore. Lessons are stored as embedded chunks; before each attempt, only the top-k most relevant lessons for the current task are retrieved and surfaced. Avoids context bloat and keeps tutorial advice scoped to where it applies.

Usage

from loomflow import Agent, HashEmbedder from loomflow.architecture import Reflexion from loomflow.vectorstore import InMemoryVectorStore agent = Agent( "Solve the puzzle.", model="claude-opus-4-7", architecture=Reflexion( max_attempts=3, threshold=0.85, lesson_store=InMemoryVectorStore(embedder=HashEmbedder()), ), ) result = await agent.run("...")

For cross-session learning, swap InMemoryVectorStore for PostgresVectorStore and the lessons persist across processes.

Wrapping any base architecture

Reflexion(base=...) defaults to ReAct but accepts any architecture. The killer combination is Reflexion of Supervisor. The team learns across attempts which worker handles which intent best:

from loomflow import Agent from loomflow.architecture import Reflexion, Supervisor agent = Agent( "...", model="claude-opus-4-7", architecture=Reflexion( base=Supervisor(workers={"researcher": ..., "writer": ...}), max_attempts=3, threshold=0.85, lesson_store=InMemoryVectorStore(embedder=HashEmbedder()), ), )

See Recursive composition.

When Reflexion pays off

  • Tasks with a clear evaluator signal (test pass/fail, JSON schema match, factual correctness against ground truth).
  • Repeated runs of the same problem class. Lessons compound.
  • Worth roughly 1.5–3× ReAct cost; pays back when the alternative is shipping wrong answers to users.

Without an evaluator, Reflexion does nothing useful. The lesson generator only fires when evaluator(output) < threshold. Pass an evaluator that returns a meaningful score for your domain (test runner, regex match, structured-output validity, etc.).

Last updated on