Skip to Content
DocsRAGEnd-to-end tutorial

End-to-end RAG tutorial

We’ll build a RAG agent that answers questions over a folder of PDFs. ~30 lines of real code; no LangChain.

What we’ll wire up

docs/ company_handbook.pdf engineering_guide.pdf security_policy.pdf support_runbook.pdf ▼ loomflow.loader.load(...) Document(content=<markdown>) ▼ RecursiveChunker(chunk_size=600).split(...) list[Chunk] ▼ ChromaVectorStore.add(chunks) (persisted on disk) indexed collection ▼ @tool search_docs(query): wraps store.search(query, k=4) Agent(model="gpt-4.1-mini", tools=[search_docs])

Install

pip install 'loomflow[loader-pdf,vectorstore-chroma,openai]' export OPENAI_API_KEY=sk-...

The whole script

Index the corpus once

import asyncio from pathlib import Path from loomflow.memory.embedder import OpenAIEmbedder from loomflow.vectorstore import ChromaVectorStore from loomflow.loader import RecursiveChunker, load_pdf CORPUS = Path("./docs") PDF_BACKEND = "unstructured" # or "docling" — see /docs/rag/loaders async def index(): # One persist directory per backend so chunks from different # extraction pipelines don't silently mix in the same collection. store = ChromaVectorStore.local( f"./chroma-db-{PDF_BACKEND}", embedder=OpenAIEmbedder("text-embedding-3-small"), collection=f"company_docs_{PDF_BACKEND}", ) if await store.count() > 0: return store chunker = RecursiveChunker(chunk_size=600, chunk_overlap=50) for pdf in CORPUS.glob("*.pdf"): doc = load_pdf(str(pdf), backend=PDF_BACKEND) chunks = chunker.split(doc.content, source=str(pdf)) await store.add(chunks) return store

A few production notes:

  • if await store.count() > 0: return makes the indexer idempotent . Re-running the script doesn’t re-embed.
  • source=str(pdf) lands in each chunk’s metadata so you can cite the source filename in answers.
  • For larger corpora swap ChromaVectorStore for PostgresVectorStore or FAISSVectorStore.
  • The auto-dispatch from loomflow.loader import load also works , load(pdf) resolves to the unstructured backend with the default fast strategy. Calling load_pdf directly is what you want when you need to set backend= / strategy= / languages=. See PDF loader.

Wire the retriever as a tool

from loomflow import Agent, tool def make_agent(store): @tool async def search_docs(query: str) -> str: """Search the company handbook, engineering guide, security policy, and support runbook. Returns the top 4 most relevant chunks with their source filenames.""" hits = await store.search(query, k=4) formatted = [] for h in hits: source = h.chunk.metadata.get("source", "unknown") formatted.append(f"[{source}]\n{h.chunk.content}") return "\n\n---\n\n".join(formatted) return Agent( instructions=( "You are a research assistant for the company. Use the " "search_docs tool to find relevant passages. ALWAYS cite " "the source filename in brackets when you use a fact." ), model="gpt-4.1-mini", tools=[search_docs], )

The retriever is just a regular @tool. The agent loop dispatches it like any other.

Run it

async def main(): store = await index() agent = make_agent(store) result = await agent.run("What's the on-call rotation policy?") print(result.output) asyncio.run(main())

What you get for free

  • Replay-correct. Wrap with runtime=SqliteRuntime("./journal.db") and crashed runs resume.
  • Multi-tenant. Pass user_id= to agent.run(); conversation memory partitions automatically.
  • Streaming. Swap agent.run() for agent.stream() to get per-chunk events.
  • Audit log. Pass audit_log=FileAuditLog(...) and every search_docs call lands in audit.jsonl with HMAC signatures.

Add diversity + filters

For short-tail queries that hit the same chunks, add MMR diversity:

hits = await store.search(query, k=8, diversity=0.4)

To restrict the search to one source:

hits = await store.search( query, k=4, filter={"source": "./docs/security_policy.pdf"}, )

Per-domain RAG with multi-agent debate

The framework’s examples/02_specialist_debate.py builds five domain specialists (IT / physics / medicine / finance / law), each with their own folder of PDFs and their own Chroma collection, composed via Team.debate(...) with a synthesising judge. Worth a read once you’ve got the basics.

Embedder cost. text-embedding-3-small is ~$0.02 per million tokens; a 50-page PDF is ~30K tokens. Embedding the corpus once costs ~$0.0006. Storage + recall are essentially free. The agent’s LLM calls dominate the bill.

Last updated on