End-to-end RAG tutorial
We’ll build a RAG agent that answers questions over a folder of PDFs. ~30 lines of real code; no LangChain.
What we’ll wire up
docs/
company_handbook.pdf
engineering_guide.pdf
security_policy.pdf
support_runbook.pdf
│
▼ loomflow.loader.load(...)
Document(content=<markdown>)
│
▼ RecursiveChunker(chunk_size=600).split(...)
list[Chunk]
│
▼ ChromaVectorStore.add(chunks) (persisted on disk)
indexed collection
│
▼ @tool search_docs(query): wraps store.search(query, k=4)
Agent(model="gpt-4.1-mini", tools=[search_docs])Install
pip install 'loomflow[loader-pdf,vectorstore-chroma,openai]'
export OPENAI_API_KEY=sk-...The whole script
Index the corpus once
import asyncio
from pathlib import Path
from loomflow.memory.embedder import OpenAIEmbedder
from loomflow.vectorstore import ChromaVectorStore
from loomflow.loader import RecursiveChunker, load_pdf
CORPUS = Path("./docs")
PDF_BACKEND = "unstructured" # or "docling" — see /docs/rag/loaders
async def index():
# One persist directory per backend so chunks from different
# extraction pipelines don't silently mix in the same collection.
store = ChromaVectorStore.local(
f"./chroma-db-{PDF_BACKEND}",
embedder=OpenAIEmbedder("text-embedding-3-small"),
collection=f"company_docs_{PDF_BACKEND}",
)
if await store.count() > 0:
return store
chunker = RecursiveChunker(chunk_size=600, chunk_overlap=50)
for pdf in CORPUS.glob("*.pdf"):
doc = load_pdf(str(pdf), backend=PDF_BACKEND)
chunks = chunker.split(doc.content, source=str(pdf))
await store.add(chunks)
return storeA few production notes:
if await store.count() > 0: returnmakes the indexer idempotent . Re-running the script doesn’t re-embed.source=str(pdf)lands in each chunk’s metadata so you can cite the source filename in answers.- For larger corpora swap
ChromaVectorStoreforPostgresVectorStoreorFAISSVectorStore. - The auto-dispatch
from loomflow.loader import loadalso works ,load(pdf)resolves to the unstructured backend with the defaultfaststrategy. Callingload_pdfdirectly is what you want when you need to setbackend=/strategy=/languages=. See PDF loader.
Wire the retriever as a tool
from loomflow import Agent, tool
def make_agent(store):
@tool
async def search_docs(query: str) -> str:
"""Search the company handbook, engineering guide, security
policy, and support runbook. Returns the top 4 most relevant
chunks with their source filenames."""
hits = await store.search(query, k=4)
formatted = []
for h in hits:
source = h.chunk.metadata.get("source", "unknown")
formatted.append(f"[{source}]\n{h.chunk.content}")
return "\n\n---\n\n".join(formatted)
return Agent(
instructions=(
"You are a research assistant for the company. Use the "
"search_docs tool to find relevant passages. ALWAYS cite "
"the source filename in brackets when you use a fact."
),
model="gpt-4.1-mini",
tools=[search_docs],
)The retriever is just a regular @tool. The agent loop dispatches
it like any other.
Run it
async def main():
store = await index()
agent = make_agent(store)
result = await agent.run("What's the on-call rotation policy?")
print(result.output)
asyncio.run(main())What you get for free
- Replay-correct. Wrap with
runtime=SqliteRuntime("./journal.db")and crashed runs resume. - Multi-tenant. Pass
user_id=toagent.run(); conversation memory partitions automatically. - Streaming. Swap
agent.run()foragent.stream()to get per-chunk events. - Audit log. Pass
audit_log=FileAuditLog(...)and everysearch_docscall lands inaudit.jsonlwith HMAC signatures.
Add diversity + filters
For short-tail queries that hit the same chunks, add MMR diversity:
hits = await store.search(query, k=8, diversity=0.4)To restrict the search to one source:
hits = await store.search(
query, k=4,
filter={"source": "./docs/security_policy.pdf"},
)Per-domain RAG with multi-agent debate
The framework’s examples/02_specialist_debate.py builds five
domain specialists (IT / physics / medicine / finance / law), each
with their own folder of PDFs and their own Chroma collection,
composed via Team.debate(...) with a synthesising judge. Worth a
read once you’ve got the basics.
Embedder cost. text-embedding-3-small is ~$0.02 per million
tokens; a 50-page PDF is ~30K tokens. Embedding the corpus once
costs ~$0.0006. Storage + recall are essentially free. The agent’s
LLM calls dominate the bill.