Examples

The repo’s examples/ directory ships nineteen runnable end-to-end scripts. Each exercises a feature end-to-end against the framework without any external dependencies beyond the model API. They use Loom’s own loader / vector store / agent / workflow constructs , nothing pulled in from outside the framework.


# .env should contain OPENAI_API_KEY=sk-...
# Agent + retrieval + memory:
python examples/01_rag_pdf.py                       # default backend (unstructured)
python examples/01_rag_pdf.py --backend docling     # alt PDF backend
python examples/02_specialist_debate.py
python examples/03_multi_user_sessions.py
python examples/04_structured_outputs.py
python examples/05_memory_showcase.py
 
# Workflow primitives:
python examples/06_workflow_chain.py                # no API key required
python examples/07_workflow_route.py
python examples/08_workflow_loop.py
python examples/09_workflow_as_tool.py
python examples/10_workflow_architecture.py
python examples/11_workflow_custom_step.py
 
# Production observability — no API key required:
python examples/12_audit_log.py
python examples/13_telemetry.py
 
# Reasoning effort across providers (ANTHROPIC_API_KEY):
python examples/14_effort_dial.py
 
# Declarative config (no API key required):
python examples/15_config_file.py
 
# Shared workspace + prompt caching + living plan:
python examples/16_shared_workspace.py     # OPENAI_API_KEY
python examples/17_prompt_caching.py       # ANTHROPIC_API_KEY or OPENAI_API_KEY
python examples/18_living_plan.py          # no API key required
python examples/19_workspace_lifecycle.py  # no API key required

Each example ships a graceful skip when OPENAI_API_KEY isn’t set , it prints a hint and exits 0 so a make examples-style runner doesn’t fail.

01. RAG over PDFs

Single-agent RAG over a folder of PDFs. Loader → chunker → vector store → retriever-as-tool → agent. About 100 lines.


examples/data/general/
  company_handbook.pdf
  engineering_guide.pdf
  security_policy.pdf
  support_runbook.pdf
        │
        ▼  load_pdf(pdf, backend="unstructured" | "docling")
    Document(content=<markdown>)
        │
        ▼  RecursiveChunker(chunk_size=600).split(...)
    list[Chunk]
        │
        ▼  ChromaVectorStore.add(chunks)
    indexed collection 'general_docs_<backend>'
        │
        ▼  @tool search_docs(query): wraps store.search(query, k=4)
    Agent(model="gpt-4.1-mini", tools=[search_docs])

Picks the backend at the CLI. --backend unstructured (default, Apache 2.0, what LangChain wraps) or --backend docling (MIT, IBM Research, 2026 best-in-class on native PDFs). Each backend lands in its own Chroma collection / persist directory (general_docs_<backend>) so swapping backends doesn’t require manual cache busting.


pip install 'loomflow[loader-pdf,vectorstore-chroma,openai]'           # default
pip install 'loomflow[loader-pdf-docling,vectorstore-chroma,openai]'   # for --backend docling

The example imports load_pdf directly to surface the backend choice. The auto-dispatch load(pdf) also works (uses the unstructured default). Pick whichever fits your code style.

See: End-to-end RAG tutorial for a guided walkthrough · PDF loader for the backend / strategy reference.

Read: examples/01_rag_pdf.py

02. Specialist debate

Five domain specialists (IT / physics / medicine / finance / law), each with their own folder of PDFs and their own Chroma collection, composed via Team.debate(...) with a synthesising judge agent.


  examples/data/it/         examples/data/physics/    ...
    it_runbook.pdf            physics_notes.pdf       ...
        │                         │
        ▼                         ▼
  Chroma 'it_docs'         Chroma 'physics_docs'      ...
        │                         │
        ▼                         ▼
  search_it_docs           search_physics_docs        ...
        │                         │
        ▼                         ▼
  Agent (IT tech)          Agent (Physicist)         ...

  Team.debate(
    debaters=[it, phys, med, fin, law],
    judge=Agent("...synthesis judge..."),
    rounds=1,
  )

See: Multi-Agent Debate.

Read: examples/02_specialist_debate.py

03. Multi-user sessions

Multi-user namespacing + conversation continuity on one shared Agent + InMemoryMemory. Demonstrates that user_id is a hard partition (Alice’s history never surfaces in Bob’s recall) and that reusing session_id rehydrates prior turns as real chat history. Also shows tools reading scope via get_run_context().

See: Memory · Tools.

Read: examples/03_multi_user_sessions.py

04. Structured outputs

Type-safe structured outputs. Define a Pydantic BaseModel, pass it as output_schema=, get a validated typed instance back on result.parsed. Demonstrates schema-driven extraction (a MeetingSummary with nested ActionItems, ISO dates, sentiment enum) from a raw meeting transcript.


from pydantic import BaseModel
from loomflow import Agent
 
class ActionItem(BaseModel):
    owner: str
    description: str
    due_date: str | None
 
class MeetingSummary(BaseModel):
    title: str
    attendees: list[str]
    decisions: list[str]
    actions: list[ActionItem]
    sentiment: Literal["positive", "neutral", "negative"]
 
agent = Agent("Extract a structured summary.", model="gpt-4.1-mini")
result = await agent.run(transcript, output_schema=MeetingSummary)
 
summary: MeetingSummary = result.parsed   # validated, typed

Read: examples/04_structured_outputs.py

05. Memory showcase

Every memory backend behind one parameter. Walks through inmemory / sqlite / chroma / postgres / redis (Postgres/Redis skip gracefully without a DSN), demonstrates profile(user_id=) / forget(user_id=) / export(user_id=) GDPR ops, and shows the Consolidator extracting structured facts from raw chat episodes. The memory= parameter is the only thing that changes between backends.

See: Memory · GDPR ops.

Read: examples/05_memory_showcase.py

Workflow primitives

Each file is small (50–200 lines) and demonstrates one workflow pattern in isolation. Read them in order. Each builds on the previous one’s vocabulary.

06. Linear chain (no LLM)

Linear Workflow.chain([...]) of plain async functions. The simplest possible workflow shape. No LLM involved, no API key required. Touches RunContext propagation, WorkflowResult.visited, per_step introspection.

See: Workflow.chain.

Read: examples/06_workflow_chain.py

07. Classify + dispatch

Workflow.route(classifier, {"a": agent_a, ...}). Classify the question with a tiny model, dispatch to a specialist Agent. Demonstrates “Agent as a workflow node” composition with developer-controlled branching.

See: Workflow.route · Composition.

Read: examples/07_workflow_route.py

08. Refinement loop (cycles)

Refinement loop with cycles: draft → review → judge → (revise → review → ... → END). Shows add_router with END sentinels, max_visits_per_node safety cap, and graceful cap-exceeded handling via try/except RuntimeError + the in-place state dict.

See: Explicit graph builder , the cycles section.

Read: examples/08_workflow_loop.py

09. Workflow as tool

wf.as_tool(). The opposite composition direction. An open-ended customer-support Agent has a deterministic refund workflow available as a tool. Unified audit log shows agent’s tool_call AND workflow’s per-step entries under one user_id.

See: Composition. Direction 2.

Read: examples/09_workflow_as_tool.py

10. Architecture inside a workflow

Agent with architecture="self-refine" inside a workflow chain. Demonstrates that workflow shape and agent architecture are orthogonal axes. The architecture is encapsulated inside the agent step; the workflow doesn’t see the internal draft → critique → refine iteration.

See: Architectures · Composition.

Read: examples/10_workflow_architecture.py

11. Custom step wrapping an Agent

Agent wrapped in a custom async def step. For when “just call agent.run(prev_output)” isn’t enough. Multi-field prompt formatting, capturing RunResult metadata (tokens, turns) into workflow state, post-processing the agent’s output.

See: The @step decorator.

Read: examples/11_workflow_custom_step.py

Production observability

The last two examples exercise the framework’s observability spine , no API key required, both run with in-memory backends so you can inspect the captured data directly.

12. Audit log (HMAC-signed, JSONL on disk)

Builds an Agent + a Workflow with a shared FileAuditLog, runs both, and inspects what was written. Five things this example covers:

Two backends behind one protocol. InMemoryAuditLog for tests and notebooks, FileAuditLog for production.
Per-user_id filtering. audit.query(user_id="alice") is a partitioned read, not a payload scan.
HMAC tamper detection. verify_signature(entry, secret=) returns False for any mutation of the canonical payload, and for the wrong secret. Catches both tampering and secret-rotation mistakes.
Restart recovery. A fresh FileAuditLog against the same path scans the existing JSONL and resumes the seq counter, so new entries don’t collide.
Dict-config form for verbatim capture (0.9.36+). Hand Agent or Workflow an audit_log={"scope_full": True, ...} dict and the resolver builds the right backend, wrapped in FullTranscriptAuditLog. Prompts, outputs, and full tool-result bodies all land in the log. The default is compliance-friendly: truncated prompts, no outputs recorded.

No API keys required. Uses EchoModel.

See: Audit log attribution.

Read: examples/12_audit_log.py

13. Telemetry (four sinks, no collector required)

Runs the same scripted agent against four different telemetry sinks that ship with Loom. No OpenTelemetry SDK, no collector deploy.

InMemoryTelemetry. Accumulates CapturedSpan / CapturedMetric records in lists. Assert on them in tests directly.
ConsoleTelemetry. Prints span lines (with nested-trace indentation) and metric lines to a stream as they happen. “Tail my agent in dev.”
FileTelemetry. Append-only JSONL on disk. Each line is a structured record with parent_span_id linkage. Queryable offline with jq.
MultiTelemetry. Fan-out. Watch live in stderr AND assert on the in-memory side after.

Also demonstrates histogram-vs-counter auto-dispatch. Metric names ending in _ms / _seconds / _bytes become histograms; everything else becomes a counter. One emit_metric() API, the right instrument under the hood.

Uses ScriptedModel so the run is deterministic. No API key. For production, swap the sink for OTelTelemetry; the agent code doesn’t change.

See: Telemetry.

Read: examples/13_telemetry.py

Reasoning effort

14. Effort dial across providers (0.9.36+)

Runs the same hard reasoning question (the classic 3L / 5L jug puzzle, minimum-steps variant) at each effort tier on Claude Opus 4.7. The only regime that honours the full enum including xhigh and max.

Five things this demonstrates:

Dict-config form for model=. Set the model name, default effort, and strict_effort together in one dict: model={"name": "claude-opus-4-7", "effort": "medium"}. Same shape philosophy as audit_log={...}. Read it once, configure it once.
Equivalent explicit kwargs. Agent(model="claude-opus-4-7", effort="medium") does the same thing. Top-level kwargs win when both forms are present, so you can layer environment overrides on top of a shared config dict.
The dial actually moves cost / latency. Token usage printed at each tier. Output tokens grow with effort because the model spends more on internal thinking.
Per-call override. run(..., effort="high") wins over the agent default for that specific run. Lets one Agent serve cheap chit-chat and occasional deep reasoning without spinning up two.
strict_effort=True. Wiring effort to claude-haiku-3-5 (which doesn’t support thinking) raises EffortNotSupportedError. The wiring mistake surfaces immediately. The default warn-and-drop behaviour would let it pass silently. Same dict shape: model={"name": "claude-haiku-3-5", "effort": "high", "strict_effort": True}.

Requires ANTHROPIC_API_KEY. Opus 4.7 is the showcase model because it accepts the full enum. The dial works on any reasoning model; this example just uses the one that lets every tier through without clamping.

See: Reasoning effort · Agent reference: effort.

Read: examples/14_effort_dial.py

Declarative config

15. Build an Agent from a TOML / dict config (0.9.37+)

Writes a complete agent.toml, builds an Agent from it with Agent.from_config(path), and asserts every backend resolved to the expected concrete class. Then does the same with Agent.from_dict(cfg) to show the in-memory form. Runs offline against EchoModel. No API key needed.

Five things this demonstrates:

Agent.from_config(path). Reads a TOML file. One declaration covers model, memory, runtime, telemetry, audit log, permissions, budget, architecture, effort, skills, and MCP servers.
Agent.from_dict(cfg). Same shape, parsed in memory. Useful when your config comes from Pydantic BaseSettings, a YAML file you already parsed, or a service-config endpoint.
Backend tables. [memory] / [runtime] / [telemetry] / [audit_log] / [permissions] / [budget] each go through the same resolver Agent(...) uses for kwargs. String shorthand works too: telemetry = "memory", permissions = "strict".
Arrays of tables. [[skills]] loads skill bundles by path. [[mcp]] connects to MCP servers (stdio or HTTP transport).
Kwarg override. Real callables (tools, hooks, secret stores, retry policies) come through Python kwargs since TOML can’t express them. When a kwarg and a config entry collide, the kwarg wins. That’s the per-environment override path.

See: Config file (TOML / dict) · Agent reference: from_config.

Read: examples/15_config_file.py

Multi-agent workspace

16. Shared notebook for multi-agent teams (0.9.39+)

A research team — four specialists plus a synthesizer — collaborates on one question through a shared notebook. Each specialist runs in parallel and writes exactly one findings note. None of them sees the others’ transcripts. The synthesizer then reads everyone’s notes via list_notes / read_note and writes the final recommendation into the same notebook.

What this shows:

No history sharing. Specialists run hermetically. Only their curated notes cross between agents, not raw transcripts.
Synthesizer context stays small. It reads N ~100-token notes instead of N full transcripts.
Auto-attribution. Each agent’s notes are tagged with its team role because the author identity is baked into the workspace tools’ closure. The agent never types its own name.
Filesystem-mounted. WORKSPACE.md regenerates atomically. You can cat it during or after the run.

Uses gpt-4.1-mini, so the demo is cheap (~$0.02 per run). Requires OPENAI_API_KEY.

See: Workspace.

Read: examples/16_shared_workspace.py

17. Prompt caching (0.9.41+)

Runs the same big system prompt twice, back to back, with prompt_caching=True. The second run shows non-zero cached_tokens_in and a markedly lower cost_usd. Live cost evidence, not a claim.

Covers the boolean form on both Anthropic and OpenAI, the dict form for advanced control (ttl="1h", cache_key), and the per-provider behavior (Anthropic injects cache_control markers; OpenAI is automatic and the flag buys accurate accounting plus routing).

Requires ANTHROPIC_API_KEY or OPENAI_API_KEY.

See: Prompt caching.

Read: examples/17_prompt_caching.py

18. Living plan (0.9.42+)

Walks a living_plan=True agent through the TodoWrite discipline: commit a plan with plan_write, do work with a tool, rewrite the plan to mark the step done with a finding, emit a final message. After the run it inspects the workspace and confirms the plan mirrored to a kind="plan" note.

Also shows pre-seeding: pass a constructed LivingPlan so the next run starts with a plan already in place. Uses a ScriptedModel, so it runs offline. No API key.

See: Living plan.

Read: examples/18_living_plan.py

19. Workspace lifecycle (0.10.0+)

Exercises all eight v0.10 workspace features in one offline script: namespacing, versioning, archive, questions, semantic search (with a tiny deterministic stub embedder), citation tracking plus outcome attribution, relevance-aware search, and citation-aware prune().

Fully offline. Uses InMemoryWorkspace. No API key.

See: Workspace lifecycle.

Read: examples/19_workspace_lifecycle.py

Sample data

The image-bearing examples (01, 02) generate small sample PDFs on first run (via reportlab) and cache them under examples/data/. The on-disk Chroma indices are also cached, so subsequent runs only re-execute the agent loop against the model.

Examples 06–11 ship with the workflow primitive overhaul. They exercise the developer-controlled DAG side of the framework. The peer to the LLM-controlled Agent. Each is meant to be readable end-to-end in under five minutes. For more focused snippets see Recipes; for the conceptual overview see Workflow.