Skip to Content
DocsMemoryHybrid recall

Hybrid recall (0.9.27+)

Every Memory backend exposes recall_scored(). A hybrid retrieval method that returns episodes paired with the score components used to rank them: BM25 lexical match, vector cosine, and (when a reranker is wired) a reranker score. Built so adding rerankers / MMR diversification / score-threshold filters later doesn’t break the protocol.

matches = await agent.memory.recall_scored( "postgres replication failure", user_id="alice", alpha=0.5, # 0=BM25 only, 1=vector only, 0.5=balanced (RRF) ) for m in matches: print(m.episode.input, m.score, m.bm25_score, m.vector_score)

For the conceptual model behind episodes vs facts see Episode vs Fact.


Why hybrid

Plain vector recall (cosine over embeddings) misses queries with rare keywords. Acronyms, IDs, proper nouns, error codes, that embeddings smooth out. Plain BM25 misses synonyms and paraphrases that share no surface tokens.

Hybrid retrieval runs both rankers and fuses the results. Loom uses Reciprocal Rank Fusion (RRF). The field-standard algorithm that scores by rank position only, ignoring raw score magnitudes (robust when cosine ∈ [-1, 1] and BM25 ∈ [0, ∞) are incommensurable).

Query shapePure vectorPure BM25Hybrid (RRF)
“How do I improve database speed?“strong (paraphrase match)weak (no shared terms)strong
”INV-2026-0042”weak (embedding smooths IDs)strong (exact token match)strong
”Tell me about replicating postgres”mediummediumstrong (both signals reinforce)

Memory.recall_scored(...)

async def recall_scored( self, query: str, *, kind: str = "episodic", limit: int = 5, time_range: tuple[datetime, datetime] | None = None, user_id: str | None = None, alpha: float = 0.5, ) -> list[EpisodeMatch]: ...

Same filtering semantics as recall() (user_id partition, time_range, kind), but returns scored matches instead of bare episodes.

ParameterTypeDefaultDescription
querystrrequiredThe query text to match against episode inputs / outputs.
kindstr"episodic"Episode kind filter (typically "episodic").
limitint5Max matches returned.
time_rangetuple[datetime, datetime] | NoneNoneOptional (start, end) UTC range.
user_idstr | NoneNoneMulti-tenant partition. The anonymous bucket is the implicit default. Cross-tenant leaks are forbidden by the protocol.
alphafloat0.5Weights the lexical vs vector mix in backends that compute both: 0.0 = pure BM25, 1.0 = pure vector, 0.5 = balanced RRF. Backends that don’t compute one of the two ignore alpha.

Returns, list[EpisodeMatch], sorted by score descending.


EpisodeMatch

class EpisodeMatch(BaseModel): episode: Episode score: float vector_score: float | None = None bm25_score: float | None = None rerank_score: float | None = None
FieldTypeDescription
episodeEpisodeThe recalled episode (raw, unmodified).
scorefloatFinal fused score the backend used for ranking. Higher is better. Range and meaning are backend-defined.
vector_scorefloat | NoneCosine-similarity component, in [-1, 1]. None when the backend didn’t compute embeddings (no embedder configured, or pure-lexical recall).
bm25_scorefloat | NoneBM25 lexical-match component. None when the backend didn’t compute a BM25 ranking (e.g. shim-only backends).
rerank_scorefloat | NoneOptional cross-encoder / LLM reranker score, computed AFTER the initial fused ranking. None when no reranker was configured.

Adding new score components is backward-compatible. Pydantic ignores unknown fields by default and adds None defaults for new ones, so the protocol can grow (cross-encoder rerankers, MMR diversifiers, learned-to-rank scores) without breaking existing backends.

from loomflow import EpisodeMatch # exported top-level

What each backend does

BackendHybrid implementation
InMemoryMemoryNative BM25. Replaces the prior substring-match-then-recency behaviour. Lexically-matching episodes rank ahead of unrelated recent ones. Empty/zero-match queries fall back to recency with neutral scores so callers always get something useful.
VectorMemoryNative BM25 + cosine + RRF. Reuses the same RRF math as loomflow.vectorstore.search_hybrid. vector_score AND bm25_score populated. alpha=0 collapses to BM25-only.
ChromaMemoryShim via default_recall_scored. Wraps recall() results with neutral score 1.0. Native plumbing of Chroma’s distance into vector_score is a future revision.
PostgresMemoryShim via default_recall_scored.
RedisMemoryShim via default_recall_scored.
SqliteMemoryShim via default_recall_scored.
AutoExtractMemoryPass-through. Preserves the inner backend’s score breakdown when present, else wraps with neutral scores.
LazyMemoryPass-through. Preserves the inner backend’s score breakdown.

The protocol contract holds across every backend; native plumbing of provider-specific scores into vector_score for Chroma / Postgres / Redis can land later as additive changes.


default_recall_scored helper

from loomflow.memory import default_recall_scored

A no-op fallback that wraps each Episode from a backend’s recall() in an EpisodeMatch with a neutral score=1.0. Used by backends without native hybrid scoring so the protocol stays coherent.

If you’re writing a custom Memory backend, the easiest path is:

from loomflow.memory import default_recall_scored class MyMemory: async def recall(self, query, *, ...): ... async def recall_scored(self, query, **kwargs): return await default_recall_scored(self, query, **kwargs)

That gives you a working recall_scored while you decide whether to plumb your backend’s native distance / score into vector_score.


Worked example. InMemoryMemory native BM25

from datetime import UTC, datetime from loomflow import Agent, InMemoryMemory, Episode memory = InMemoryMemory() agent = Agent("...", model="gpt-4.1-mini", memory=memory) # Seed three episodes for alice for text in [ "Postgres replication is configured asynchronously.", "Today's lunch was great.", "We migrated to logical replication on the primary.", ]: await memory.remember( Episode( user_id="alice", kind="episodic", input="...", output=text, created_at=datetime.now(UTC), ) ) matches = await memory.recall_scored( "postgres replication", user_id="alice", limit=3, ) for m in matches: print(f"{m.score:.3f} bm25={m.bm25_score:.3f}{m.episode.output[:50]}") # 1.123 bm25=1.123 → Postgres replication is configured asynchronously. # 0.412 bm25=0.412 → We migrated to logical replication on the primary. # 0.000 bm25=0.000 → Today's lunch was great.

The lexically-matching episodes rank ahead of the unrelated “lunch” episode. A real recall-quality upgrade over the prior substring-then-recency behaviour.


Composing rerankers / MMR / score thresholds

The EpisodeMatch shape is what makes downstream layers straightforward. A score-threshold filter:

matches = await memory.recall_scored("...", user_id="alice", limit=20) strong = [m for m in matches if m.score >= 0.6]

A cross-encoder reranker (using whatever scoring library you like):

from sentence_transformers import CrossEncoder ranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2") matches = await memory.recall_scored("...", user_id="alice", limit=50) texts = [m.episode.output for m in matches] rerank_scores = ranker.predict([("...", t) for t in texts]) reranked = sorted( [ EpisodeMatch( episode=m.episode, score=float(rerank_scores[i]), vector_score=m.vector_score, bm25_score=m.bm25_score, rerank_score=float(rerank_scores[i]), ) for i, m in enumerate(matches) ], key=lambda m: m.score, reverse=True, )[:5]

Once a backend ships a built-in reranker hook, the same pattern becomes a one-line opt-in. The protocol is ready.

Why this matters competitively. Before 0.9.27 Loom’s recall was cosine + token-overlap fallback only. Weaker than Zep (BM25 + vector + graph BFS + reranker) and CrewAI (composite + deep mode). After 0.9.27, InMemoryMemory and VectorMemory ship native BM25 hybrid; the protocol shape is ready for a reranker / MMR / cross-encoder layer when someone needs one. The bi-temporal + auto-extract + multi-tenant + hybrid-recall combination is unique to Loom in the open-source field.

Last updated on