Hybrid recall (0.9.27+)

Every Memory backend exposes recall_scored(). A hybrid retrieval method that returns episodes paired with the score components used to rank them: BM25 lexical match, vector cosine, and (when a reranker is wired) a reranker score. Built so adding rerankers / MMR diversification / score-threshold filters later doesn’t break the protocol.


matches = await agent.memory.recall_scored(
    "postgres replication failure",
    user_id="alice",
    alpha=0.5,        # 0=BM25 only, 1=vector only, 0.5=balanced (RRF)
)
for m in matches:
    print(m.episode.input, m.score, m.bm25_score, m.vector_score)

For the conceptual model behind episodes vs facts see Episode vs Fact.

Why hybrid

Plain vector recall (cosine over embeddings) misses queries with rare keywords. Acronyms, IDs, proper nouns, error codes, that embeddings smooth out. Plain BM25 misses synonyms and paraphrases that share no surface tokens.

Hybrid retrieval runs both rankers and fuses the results. Loom uses Reciprocal Rank Fusion (RRF). The field-standard algorithm that scores by rank position only, ignoring raw score magnitudes (robust when cosine ∈ [-1, 1] and BM25 ∈ [0, ∞) are incommensurable).

Query shape	Pure vector	Pure BM25	Hybrid (RRF)
“How do I improve database speed?“	strong (paraphrase match)	weak (no shared terms)	strong
”INV-2026-0042”	weak (embedding smooths IDs)	strong (exact token match)	strong
”Tell me about replicating postgres”	medium	medium	strong (both signals reinforce)

`Memory.recall_scored(...)`


async def recall_scored(
    self,
    query: str,
    *,
    kind: str = "episodic",
    limit: int = 5,
    time_range: tuple[datetime, datetime] | None = None,
    user_id: str | None = None,
    alpha: float = 0.5,
) -> list[EpisodeMatch]: ...

Same filtering semantics as recall() (user_id partition, time_range, kind), but returns scored matches instead of bare episodes.

Parameter	Type	Default	Description
`query`	`str`	required	The query text to match against episode inputs / outputs.
`kind`	`str`	`"episodic"`	Episode kind filter (typically `"episodic"`).
`limit`	`int`	`5`	Max matches returned.
`time_range`	`tuple[datetime, datetime] \| None`	`None`	Optional `(start, end)` UTC range.
`user_id`	`str \| None`	`None`	Multi-tenant partition. The anonymous bucket is the implicit default. Cross-tenant leaks are forbidden by the protocol.
`alpha`	`float`	`0.5`	Weights the lexical vs vector mix in backends that compute both: `0.0` = pure BM25, `1.0` = pure vector, `0.5` = balanced RRF. Backends that don’t compute one of the two ignore `alpha`.

Returns, list[EpisodeMatch], sorted by score descending.

`EpisodeMatch`


class EpisodeMatch(BaseModel):
    episode: Episode
    score: float
    vector_score: float | None = None
    bm25_score: float | None = None
    rerank_score: float | None = None

Field	Type	Description
`episode`	`Episode`	The recalled episode (raw, unmodified).
`score`	`float`	Final fused score the backend used for ranking. Higher is better. Range and meaning are backend-defined.
`vector_score`	`float \| None`	Cosine-similarity component, in `[-1, 1]`. `None` when the backend didn’t compute embeddings (no embedder configured, or pure-lexical recall).
`bm25_score`	`float \| None`	BM25 lexical-match component. `None` when the backend didn’t compute a BM25 ranking (e.g. shim-only backends).
`rerank_score`	`float \| None`	Optional cross-encoder / LLM reranker score, computed AFTER the initial fused ranking. `None` when no reranker was configured.

Adding new score components is backward-compatible. Pydantic ignores unknown fields by default and adds None defaults for new ones, so the protocol can grow (cross-encoder rerankers, MMR diversifiers, learned-to-rank scores) without breaking existing backends.


from loomflow import EpisodeMatch    # exported top-level

What each backend does

Backend	Hybrid implementation
`InMemoryMemory`	Native BM25. Replaces the prior substring-match-then-recency behaviour. Lexically-matching episodes rank ahead of unrelated recent ones. Empty/zero-match queries fall back to recency with neutral scores so callers always get something useful.
`VectorMemory`	Native BM25 + cosine + RRF. Reuses the same RRF math as `loomflow.vectorstore.search_hybrid`. `vector_score` AND `bm25_score` populated. `alpha=0` collapses to BM25-only.
`ChromaMemory`	Shim via `default_recall_scored`. Wraps `recall()` results with neutral score `1.0`. Native plumbing of Chroma’s distance into `vector_score` is a future revision.
`PostgresMemory`	Shim via `default_recall_scored`.
`RedisMemory`	Shim via `default_recall_scored`.
`SqliteMemory`	Shim via `default_recall_scored`.
`AutoExtractMemory`	Pass-through. Preserves the inner backend’s score breakdown when present, else wraps with neutral scores.
`LazyMemory`	Pass-through. Preserves the inner backend’s score breakdown.

The protocol contract holds across every backend; native plumbing of provider-specific scores into vector_score for Chroma / Postgres / Redis can land later as additive changes.

`default_recall_scored` helper


from loomflow.memory import default_recall_scored

A no-op fallback that wraps each Episode from a backend’s recall() in an EpisodeMatch with a neutral score=1.0. Used by backends without native hybrid scoring so the protocol stays coherent.

If you’re writing a custom Memory backend, the easiest path is:


from loomflow.memory import default_recall_scored
 
class MyMemory:
    async def recall(self, query, *, ...):
        ...
 
    async def recall_scored(self, query, **kwargs):
        return await default_recall_scored(self, query, **kwargs)

That gives you a working recall_scored while you decide whether to plumb your backend’s native distance / score into vector_score.

Worked example. InMemoryMemory native BM25


from datetime import UTC, datetime
from loomflow import Agent, InMemoryMemory, Episode
 
memory = InMemoryMemory()
agent = Agent("...", model="gpt-4.1-mini", memory=memory)
 
# Seed three episodes for alice
for text in [
    "Postgres replication is configured asynchronously.",
    "Today's lunch was great.",
    "We migrated to logical replication on the primary.",
]:
    await memory.remember(
        Episode(
            user_id="alice",
            kind="episodic",
            input="...",
            output=text,
            created_at=datetime.now(UTC),
        )
    )
 
matches = await memory.recall_scored(
    "postgres replication", user_id="alice", limit=3,
)
for m in matches:
    print(f"{m.score:.3f}  bm25={m.bm25_score:.3f}  → {m.episode.output[:50]}")
# 1.123  bm25=1.123  → Postgres replication is configured asynchronously.
# 0.412  bm25=0.412  → We migrated to logical replication on the primary.
# 0.000  bm25=0.000  → Today's lunch was great.

The lexically-matching episodes rank ahead of the unrelated “lunch” episode. A real recall-quality upgrade over the prior substring-then-recency behaviour.

Composing rerankers / MMR / score thresholds

The EpisodeMatch shape is what makes downstream layers straightforward. A score-threshold filter:


matches = await memory.recall_scored("...", user_id="alice", limit=20)
strong = [m for m in matches if m.score >= 0.6]

A cross-encoder reranker (using whatever scoring library you like):


from sentence_transformers import CrossEncoder
ranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
 
matches = await memory.recall_scored("...", user_id="alice", limit=50)
texts = [m.episode.output for m in matches]
rerank_scores = ranker.predict([("...", t) for t in texts])
 
reranked = sorted(
    [
        EpisodeMatch(
            episode=m.episode,
            score=float(rerank_scores[i]),
            vector_score=m.vector_score,
            bm25_score=m.bm25_score,
            rerank_score=float(rerank_scores[i]),
        )
        for i, m in enumerate(matches)
    ],
    key=lambda m: m.score,
    reverse=True,
)[:5]

Once a backend ships a built-in reranker hook, the same pattern becomes a one-line opt-in. The protocol is ready.

Why this matters competitively. Before 0.9.27 Loom’s recall was cosine + token-overlap fallback only. Weaker than Zep (BM25 + vector + graph BFS + reranker) and CrewAI (composite + deep mode). After 0.9.27, InMemoryMemory and VectorMemory ship native BM25 hybrid; the protocol shape is ready for a reranker / MMR / cross-encoder layer when someone needs one. The bi-temporal + auto-extract + multi-tenant + hybrid-recall combination is unique to Loom in the open-source field.