Hybrid recall (0.9.27+)
Every Memory backend exposes recall_scored(). A hybrid retrieval
method that returns episodes paired with the score components
used to rank them: BM25 lexical match, vector cosine, and (when a
reranker is wired) a reranker score. Built so adding rerankers / MMR
diversification / score-threshold filters later doesn’t break the
protocol.
matches = await agent.memory.recall_scored(
"postgres replication failure",
user_id="alice",
alpha=0.5, # 0=BM25 only, 1=vector only, 0.5=balanced (RRF)
)
for m in matches:
print(m.episode.input, m.score, m.bm25_score, m.vector_score)For the conceptual model behind episodes vs facts see Episode vs Fact.
Why hybrid
Plain vector recall (cosine over embeddings) misses queries with rare keywords. Acronyms, IDs, proper nouns, error codes, that embeddings smooth out. Plain BM25 misses synonyms and paraphrases that share no surface tokens.
Hybrid retrieval runs both rankers and fuses the results. Loom
uses Reciprocal Rank Fusion (RRF). The field-standard algorithm
that scores by rank position only, ignoring raw score magnitudes
(robust when cosine ∈ [-1, 1] and BM25 ∈ [0, ∞) are
incommensurable).
| Query shape | Pure vector | Pure BM25 | Hybrid (RRF) |
|---|---|---|---|
| “How do I improve database speed?“ | strong (paraphrase match) | weak (no shared terms) | strong |
| ”INV-2026-0042” | weak (embedding smooths IDs) | strong (exact token match) | strong |
| ”Tell me about replicating postgres” | medium | medium | strong (both signals reinforce) |
Memory.recall_scored(...)
async def recall_scored(
self,
query: str,
*,
kind: str = "episodic",
limit: int = 5,
time_range: tuple[datetime, datetime] | None = None,
user_id: str | None = None,
alpha: float = 0.5,
) -> list[EpisodeMatch]: ...Same filtering semantics as recall() (user_id partition,
time_range, kind), but returns scored matches instead of bare
episodes.
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str | required | The query text to match against episode inputs / outputs. |
kind | str | "episodic" | Episode kind filter (typically "episodic"). |
limit | int | 5 | Max matches returned. |
time_range | tuple[datetime, datetime] | None | None | Optional (start, end) UTC range. |
user_id | str | None | None | Multi-tenant partition. The anonymous bucket is the implicit default. Cross-tenant leaks are forbidden by the protocol. |
alpha | float | 0.5 | Weights the lexical vs vector mix in backends that compute both: 0.0 = pure BM25, 1.0 = pure vector, 0.5 = balanced RRF. Backends that don’t compute one of the two ignore alpha. |
Returns, list[EpisodeMatch], sorted by score descending.
EpisodeMatch
class EpisodeMatch(BaseModel):
episode: Episode
score: float
vector_score: float | None = None
bm25_score: float | None = None
rerank_score: float | None = None| Field | Type | Description |
|---|---|---|
episode | Episode | The recalled episode (raw, unmodified). |
score | float | Final fused score the backend used for ranking. Higher is better. Range and meaning are backend-defined. |
vector_score | float | None | Cosine-similarity component, in [-1, 1]. None when the backend didn’t compute embeddings (no embedder configured, or pure-lexical recall). |
bm25_score | float | None | BM25 lexical-match component. None when the backend didn’t compute a BM25 ranking (e.g. shim-only backends). |
rerank_score | float | None | Optional cross-encoder / LLM reranker score, computed AFTER the initial fused ranking. None when no reranker was configured. |
Adding new score components is backward-compatible. Pydantic
ignores unknown fields by default and adds None defaults for new
ones, so the protocol can grow (cross-encoder rerankers, MMR
diversifiers, learned-to-rank scores) without breaking existing
backends.
from loomflow import EpisodeMatch # exported top-levelWhat each backend does
| Backend | Hybrid implementation |
|---|---|
InMemoryMemory | Native BM25. Replaces the prior substring-match-then-recency behaviour. Lexically-matching episodes rank ahead of unrelated recent ones. Empty/zero-match queries fall back to recency with neutral scores so callers always get something useful. |
VectorMemory | Native BM25 + cosine + RRF. Reuses the same RRF math as loomflow.vectorstore.search_hybrid. vector_score AND bm25_score populated. alpha=0 collapses to BM25-only. |
ChromaMemory | Shim via default_recall_scored. Wraps recall() results with neutral score 1.0. Native plumbing of Chroma’s distance into vector_score is a future revision. |
PostgresMemory | Shim via default_recall_scored. |
RedisMemory | Shim via default_recall_scored. |
SqliteMemory | Shim via default_recall_scored. |
AutoExtractMemory | Pass-through. Preserves the inner backend’s score breakdown when present, else wraps with neutral scores. |
LazyMemory | Pass-through. Preserves the inner backend’s score breakdown. |
The protocol contract holds across every backend; native plumbing
of provider-specific scores into vector_score for Chroma /
Postgres / Redis can land later as additive changes.
default_recall_scored helper
from loomflow.memory import default_recall_scoredA no-op fallback that wraps each Episode from a backend’s
recall() in an EpisodeMatch with a neutral score=1.0. Used by
backends without native hybrid scoring so the protocol stays
coherent.
If you’re writing a custom Memory backend, the easiest path is:
from loomflow.memory import default_recall_scored
class MyMemory:
async def recall(self, query, *, ...):
...
async def recall_scored(self, query, **kwargs):
return await default_recall_scored(self, query, **kwargs)That gives you a working recall_scored while you decide whether
to plumb your backend’s native distance / score into vector_score.
Worked example. InMemoryMemory native BM25
from datetime import UTC, datetime
from loomflow import Agent, InMemoryMemory, Episode
memory = InMemoryMemory()
agent = Agent("...", model="gpt-4.1-mini", memory=memory)
# Seed three episodes for alice
for text in [
"Postgres replication is configured asynchronously.",
"Today's lunch was great.",
"We migrated to logical replication on the primary.",
]:
await memory.remember(
Episode(
user_id="alice",
kind="episodic",
input="...",
output=text,
created_at=datetime.now(UTC),
)
)
matches = await memory.recall_scored(
"postgres replication", user_id="alice", limit=3,
)
for m in matches:
print(f"{m.score:.3f} bm25={m.bm25_score:.3f} → {m.episode.output[:50]}")
# 1.123 bm25=1.123 → Postgres replication is configured asynchronously.
# 0.412 bm25=0.412 → We migrated to logical replication on the primary.
# 0.000 bm25=0.000 → Today's lunch was great.The lexically-matching episodes rank ahead of the unrelated “lunch” episode. A real recall-quality upgrade over the prior substring-then-recency behaviour.
Composing rerankers / MMR / score thresholds
The EpisodeMatch shape is what makes downstream layers
straightforward. A score-threshold filter:
matches = await memory.recall_scored("...", user_id="alice", limit=20)
strong = [m for m in matches if m.score >= 0.6]A cross-encoder reranker (using whatever scoring library you like):
from sentence_transformers import CrossEncoder
ranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
matches = await memory.recall_scored("...", user_id="alice", limit=50)
texts = [m.episode.output for m in matches]
rerank_scores = ranker.predict([("...", t) for t in texts])
reranked = sorted(
[
EpisodeMatch(
episode=m.episode,
score=float(rerank_scores[i]),
vector_score=m.vector_score,
bm25_score=m.bm25_score,
rerank_score=float(rerank_scores[i]),
)
for i, m in enumerate(matches)
],
key=lambda m: m.score,
reverse=True,
)[:5]Once a backend ships a built-in reranker hook, the same pattern becomes a one-line opt-in. The protocol is ready.
Why this matters competitively. Before 0.9.27 Loom’s recall was
cosine + token-overlap fallback only. Weaker than Zep (BM25 +
vector + graph BFS + reranker) and CrewAI (composite + deep mode).
After 0.9.27, InMemoryMemory and VectorMemory ship native BM25
hybrid; the protocol shape is ready for a reranker / MMR /
cross-encoder layer when someone needs one. The bi-temporal +
auto-extract + multi-tenant + hybrid-recall combination is unique
to Loom in the open-source field.