Embedders
An embedder turns text into a vector. The framework ships four:
| Class | Default model | When to use |
|---|---|---|
OpenAIEmbedder | text-embedding-3-small | Production. Auto-picked if OPENAI_API_KEY is set. |
VoyageEmbedder | voyage-3 | Production. Best-in-class for retrieval as of 2026. |
CohereEmbedder | embed-english-v3.0 | Production. Strong multilingual support. |
HashEmbedder | n/a | Tests / zero-key dev. Deterministic. Quality is “fine for smoke tests, bad for retrieval”. |
from loomflow import HashEmbedder
from loomflow.memory.embedder import CohereEmbedder, OpenAIEmbedder, VoyageEmbedder
embedder = OpenAIEmbedder() # text-embedding-3-small (default)
embedder = OpenAIEmbedder("text-embedding-3-large")
embedder = VoyageEmbedder("voyage-3")
embedder = CohereEmbedder(api_key="...")
embedder = HashEmbedder() # zero-key, deterministicDefault selection
When you don’t pass embedder= explicitly:
OpenAIEmbedder("text-embedding-3-small"). IfOPENAI_API_KEYis set in the env (or via yourSecretsadapter).HashEmbedder(). Otherwise.
This is intentional: a fresh pip install loomflow works without
any keys, and adding OPENAI_API_KEY automatically upgrades you to a
real embedder.
The Embedder protocol
Any class with three properties and two methods satisfies the protocol , no inheritance:
from typing import Protocol, runtime_checkable
@runtime_checkable
class Embedder(Protocol):
name: str # model name for telemetry / logs
dimensions: int # vector size
async def embed(self, text: str) -> list[float]: ...
async def embed_batch(self, texts: list[str]) -> list[list[float]]: ...embed_batch is the hot path, Memory and VectorStore call it
when adding many chunks. Real implementations should batch the
network call; the slow fallback is [await self.embed(t) for t in texts].
Custom embedder
class CustomEmbedder:
name: str = "custom-v1"
dimensions: int = 768
def __init__(self, api_key: str) -> None:
self._client = ... # your client
async def embed(self, text: str) -> list[float]:
return await self._client.embed(text)
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
return await self._client.embed_batch(texts)That’s it. Pass an instance to Memory(embedder=...) or
VectorStore(embedder=...).
See Custom embedder recipe for a full Cohere-via-third-party-SDK example.
Why batching matters
Memory.remember(episode) calls embed(episode.text) once. But when
you index a corpus with VectorStore.add(chunks) for 10K chunks,
the framework calls embed_batch(chunks) and the batched API
roundtrip is what makes ingestion practical:
| Embedder | 10K chunks (batched) | 10K chunks (per-call) |
|---|---|---|
OpenAIEmbedder | ~30s | ~15min |
VoyageEmbedder | ~25s | ~12min |
HashEmbedder | ~50ms | ~50ms (no network) |
Implement embed_batch properly even if it’s just sugar over
embed; the framework expects it.
Pinning the embedder
The embedder used at add() MUST match the embedder used at
search(). Mismatched embedders silently produce near-zero similarity:
# Wrong — first run uses OpenAI, second run uses Hash
store = ChromaVectorStore.local("./db") # auto-picks OpenAI
await store.add(chunks)
del store
import os
del os.environ["OPENAI_API_KEY"]
store = ChromaVectorStore.local("./db") # auto-picks Hash
hits = await store.search("query") # near-zero scoresPin the embedder explicitly to avoid this:
embedder = OpenAIEmbedder("text-embedding-3-small")
store = ChromaVectorStore.local("./db", embedder=embedder)Dimensions. text-embedding-3-small is 1536 dims, large is
3072 dims, voyage-3 is 1024, embed-english-v3.0 is 1024,
HashEmbedder defaults to 384. Switching dimensions invalidates the
existing index. You must re-embed.