End-to-end RAG tutorial

We’ll build a RAG agent that answers questions over a folder of PDFs. ~30 lines of real code; no LangChain.

What we’ll wire up


docs/
  company_handbook.pdf
  engineering_guide.pdf
  security_policy.pdf
  support_runbook.pdf
        │
        ▼  loomflow.loader.load(...)
    Document(content=<markdown>)
        │
        ▼  RecursiveChunker(chunk_size=600).split(...)
    list[Chunk]
        │
        ▼  ChromaVectorStore.add(chunks)   (persisted on disk)
    indexed collection
        │
        ▼  @tool search_docs(query): wraps store.search(query, k=4)
    Agent(model="gpt-4.1-mini", tools=[search_docs])

Install


pip install 'loomflow[loader-pdf,vectorstore-chroma,openai]'
export OPENAI_API_KEY=sk-...

The whole script

Index the corpus once


import asyncio
from pathlib import Path
from loomflow.memory.embedder import OpenAIEmbedder
from loomflow.vectorstore import ChromaVectorStore
from loomflow.loader import RecursiveChunker, load_pdf
 
CORPUS = Path("./docs")
PDF_BACKEND = "unstructured"   # or "docling" — see /docs/rag/loaders
 
async def index():
    # One persist directory per backend so chunks from different
    # extraction pipelines don't silently mix in the same collection.
    store = ChromaVectorStore.local(
        f"./chroma-db-{PDF_BACKEND}",
        embedder=OpenAIEmbedder("text-embedding-3-small"),
        collection=f"company_docs_{PDF_BACKEND}",
    )
    if await store.count() > 0:
        return store
 
    chunker = RecursiveChunker(chunk_size=600, chunk_overlap=50)
    for pdf in CORPUS.glob("*.pdf"):
        doc = load_pdf(str(pdf), backend=PDF_BACKEND)
        chunks = chunker.split(doc.content, source=str(pdf))
        await store.add(chunks)
    return store

A few production notes:

if await store.count() > 0: return makes the indexer idempotent . Re-running the script doesn’t re-embed.
source=str(pdf) lands in each chunk’s metadata so you can cite the source filename in answers.
For larger corpora swap ChromaVectorStore for PostgresVectorStore or FAISSVectorStore.
The auto-dispatch from loomflow.loader import load also works , load(pdf) resolves to the unstructured backend with the default fast strategy. Calling load_pdf directly is what you want when you need to set backend= / strategy= / languages=. See PDF loader.

Wire the retriever as a tool


from loomflow import Agent, tool
 
def make_agent(store):
    @tool
    async def search_docs(query: str) -> str:
        """Search the company handbook, engineering guide, security
        policy, and support runbook. Returns the top 4 most relevant
        chunks with their source filenames."""
        hits = await store.search(query, k=4)
        formatted = []
        for h in hits:
            source = h.chunk.metadata.get("source", "unknown")
            formatted.append(f"[{source}]\n{h.chunk.content}")
        return "\n\n---\n\n".join(formatted)
 
    return Agent(
        instructions=(
            "You are a research assistant for the company. Use the "
            "search_docs tool to find relevant passages. ALWAYS cite "
            "the source filename in brackets when you use a fact."
        ),
        model="gpt-4.1-mini",
        tools=[search_docs],
    )

The retriever is just a regular @tool. The agent loop dispatches it like any other.

Run it


async def main():
    store = await index()
    agent = make_agent(store)
 
    result = await agent.run("What's the on-call rotation policy?")
    print(result.output)
 
asyncio.run(main())

What you get for free

Replay-correct. Wrap with runtime=SqliteRuntime("./journal.db") and crashed runs resume.
Multi-tenant. Pass user_id= to agent.run(); conversation memory partitions automatically.
Streaming. Swap agent.run() for agent.stream() to get per-chunk events.
Audit log. Pass audit_log=FileAuditLog(...) and every search_docs call lands in audit.jsonl with HMAC signatures.

Add diversity + filters

For short-tail queries that hit the same chunks, add MMR diversity:


hits = await store.search(query, k=8, diversity=0.4)

To restrict the search to one source:


hits = await store.search(
    query, k=4,
    filter={"source": "./docs/security_policy.pdf"},
)

Per-domain RAG with multi-agent debate

The framework’s examples/02_specialist_debate.py builds five domain specialists (IT / physics / medicine / finance / law), each with their own folder of PDFs and their own Chroma collection, composed via Team.debate(...) with a synthesising judge. Worth a read once you’ve got the basics.

Embedder cost. text-embedding-3-small is ~$0.02 per million tokens; a 50-page PDF is ~30K tokens. Embedding the corpus once costs ~$0.0006. Storage + recall are essentially free. The agent’s LLM calls dominate the bill.