Reflexion
Verbal reinforcement learning via memory. Shinn et al. 2023 , Reflexion: Language Agents with Verbal Reinforcement Learning.
After each attempt, an evaluator scores the output. Below threshold, a reflector produces a single-sentence “lesson”. Written advice the agent can read on its next attempt.
┌─────────── attempt loop (max_attempts) ───────────┐
│ │
│ prompt ──► [recall lessons] ──► base.run() ──► evaluator
│ │
│ score < threshold?
│ │
│ yes ──┴── no ──► output
│ │
│ reflector ──► lesson
│ │
└────────────────────────────────── persist ─────────┘Lesson storage modes
Two storage modes for the persisted lessons:
- Monotonic block (legacy default). Every lesson is appended to
memory.<lessons_block_name>and shown to the agent on every subsequent attempt. Simple but bloats context as lessons accumulate. - Selective recall (recommended). Pass
lesson_store=aVectorStore. Lessons are stored as embedded chunks; before each attempt, only the top-k most relevant lessons for the current task are retrieved and surfaced. Avoids context bloat and keeps tutorial advice scoped to where it applies.
Usage
from loomflow import Agent, HashEmbedder
from loomflow.architecture import Reflexion
from loomflow.vectorstore import InMemoryVectorStore
agent = Agent(
"Solve the puzzle.",
model="claude-opus-4-7",
architecture=Reflexion(
max_attempts=3,
threshold=0.85,
lesson_store=InMemoryVectorStore(embedder=HashEmbedder()),
),
)
result = await agent.run("...")For cross-session learning, swap InMemoryVectorStore for
PostgresVectorStore and the lessons persist across processes.
Wrapping any base architecture
Reflexion(base=...) defaults to ReAct but accepts any architecture.
The killer combination is Reflexion of Supervisor. The team
learns across attempts which worker handles which intent best:
from loomflow import Agent
from loomflow.architecture import Reflexion, Supervisor
agent = Agent(
"...",
model="claude-opus-4-7",
architecture=Reflexion(
base=Supervisor(workers={"researcher": ..., "writer": ...}),
max_attempts=3,
threshold=0.85,
lesson_store=InMemoryVectorStore(embedder=HashEmbedder()),
),
)When Reflexion pays off
- Tasks with a clear evaluator signal (test pass/fail, JSON schema match, factual correctness against ground truth).
- Repeated runs of the same problem class. Lessons compound.
- Worth roughly 1.5–3× ReAct cost; pays back when the alternative is shipping wrong answers to users.
Without an evaluator, Reflexion does nothing useful. The lesson
generator only fires when evaluator(output) < threshold. Pass an
evaluator that returns a meaningful score for your domain (test
runner, regex match, structured-output validity, etc.).