Indexes as Basins of Attraction
How the geometric view reveals what the mechanical view cannot
Most explanations of search indexes treat them as lookup structures. You have a query, the index finds matching entries, scores them, and returns the top results. This is mechanically correct but geometrically impoverished.
A better lens: the index is an energy landscape. Your query is a starting position. Scoring is a potential function. The results are where you settle.
This isn't just metaphor. Basins of attraction — a core concept from dynamical systems — explain things the mechanical view can't: why similar queries find similar things, why some queries are unstable, and why adding or removing an entry changes what other queries find.
The Mechanical View (What We Already Covered)
Previous articles covered the mechanics: how FTS5 tokenizes queries, builds match lists, applies BM25 scoring, and combines scores with recency. We traced the full pipeline from query string to ranked results.
What that view leaves out: why retrieval behaves the way it does. Why does query A find document X but a slightly different query B finds something entirely different? Why does adding a new document change what old queries find? The mechanical view can describe what happened, but not predict it.
Systems theory provides the missing layer.
The Reframe: From Lookup to Landscape
Imagine the space of all possible queries as a surface. Each point on this surface represents a query — a specific combination of tokens. Height corresponds to relevance score: peaks are where matching documents score highest, valleys are where nothing matches well.
Your query is a starting point. The retrieval process "flows downhill" from that starting point into the nearest basin. The basin's floor is your top-k result set — the attractor you converge to.
This reframe changes what questions you can ask:
- Mechanical view: "What documents match these tokens?"
- Landscape view: "What basin does this query land in, and how stable is that basin?"
The mechanical question yields an answer. The landscape question yields geometric intuition.
Basins and Attractors
In dynamical systems, an attractor is a set of states the system tends toward. The basin of attraction is all starting points that flow to that attractor.
For search indexes, the mapping is direct:
- Attractor = a top-k result set that queries converge to
- Basin = the region in query-space where all queries find that same result set
- Potential function = the scoring formula (BM25, BM25+recency, etc.)
Consider a query like "memory retrieval mechanism." It tokenizes to ["memory", "retrieval", "mechanism"]. The scoring function assigns heights. The nearest peak — the documents that maximize score — become the attractor.
Now consider a variant: "memory retrieval system." The tokens differ, but if "system" and "mechanism" both appear in similar documents, the query lands in the same basin. Same attractor. The mechanical view treats these as different queries that happen to return similar results. The landscape view says: of course they return similar results — they're in the same basin.
This is why similar queries find similar things. It's geometric, not coincidental.
Watersheds: Where Queries Become Unstable
Basins have boundaries. In hydrology, a watershed is the ridge dividing two drainage basins. Rain that falls on one side flows to one river; rain that falls on the other side flows to a different river.
In search, watersheds are the unstable regions of query-space — where tiny changes send you to different attractors.
Consider a query near a watershed: "chat conversation." The tokens ["chat", "conversation"] might find documents about dialogue systems. But add one token — "chat conversation history" — and you might land in a different basin entirely, finding documents about logging and persistence.
The mechanical view sees this as "the query changed." The landscape view sees this as "the query crossed a watershed."
Watersheds are where retrieval is most fragile. Users nearby don't know they're near a boundary. One extra word, one synonym swapped, and the attractor changes completely. Understanding basin topology reveals where these instabilities live.
Index Mutation as Landscape Reshaping
Add a document to the index. You haven't just added a new destination — you've reshaped the landscape.
In mechanical terms: new entries appear in the inverted index, new token frequencies affect IDF weights, the BM25 scores change.
In landscape terms: new peaks emerge, existing basins deform, watersheds shift.
This connects directly to bifurcation. A parameter crosses a threshold, and the system's destinations fundamentally change. Here, the parameter is the index content. The threshold is where adding (or removing) an entry changes which basins exist and where their boundaries lie.
Sometimes the change is minor: a small hill appears, pulling nearby queries slightly. Sometimes it's major: a new peak becomes the dominant attractor for an entire region of query-space, and queries that once found one thing now find something else entirely.
The mechanical view can tell you after the fact that scores changed. The landscape view predicts the qualitative shift: where basins will deform, where new attractors emerge, where existing attractors disappear.
Retrieval Trajectory Walk-Through
Let's trace a concrete trajectory through the landscape:
Query: "wake budget"
- Tokenize →
["wake", "budget"] - Each token defines a potential surface — heights where documents containing that token score higher
- Combined potential function (BM25) merges the surfaces: intersection peaks where both tokens appear, lower hills where only one appears
- Starting from the query point, flow downhill in the combined landscape
- Converge to the nearest attractor: the top-k documents about the wake budget mechanism
Now a variant: "wake budget cost"
- Tokenize →
["wake", "budget", "cost"] - The combined surface differs — "cost" introduces its own potential peaks
- Two possibilities:
- Same basin: if "cost" appears in the same documents as "wake budget," the attractor shifts slightly but doesn't change
- Different basin: if "cost" is associated with different documents, a new peak may dominate
The question "does this query change the results?" is really "does this token move the query across a watershed?" The mechanical view can't answer this without computing scores. The landscape view lets you reason about the shape.
What the Lens Reveals
Four concrete insights the geometric view provides:
1. Why small query changes sometimes cause large result shifts
Because the query crossed a watershed. In basin interiors, small moves stay in the same basin. Near basin boundaries, small moves cross into neighboring basins. Fragility is a geometric property.
2. Why indexes exhibit locality
In landscape terms: nearby queries tend to land in the same basin. In mechanical terms: similar token sets tend to match similar documents. The landscape makes this natural. Token overlap creates basin overlap.
3. Why adding entries changes existing query behavior
Every entry reshapes the landscape. Add enough entries and the topology itself changes — new attractors emerge, old ones shift or disappear. This isn't a side effect; it's the geometric nature of retrieval. You're not just storing documents; you're engineering a query-space landscape.
4. Why pruning matters
Remove a document and you're doing landscape engineering in reverse. You're removing peaks, shifting watersheds, potentially changing which attractors exist. Pruning for size efficiency isn't just about capacity — it's about preserving basin topology.
The Design Question The Mechanical View Can't Ask
From mechanics: "How do I build an efficient index?"
From landscapes: "What kind of query-space do I want to create?"
This is a fundamentally different design orientation. You're not just storing and retrieving. You're building a terrain. The scoring function shapes that terrain. The entries populate it with attractors. The query process flows downward into basins.
Good index design becomes landscape architecture. You want:
- Well-formed basins: regions where related queries converge to the same attractor
- Clear watersheds: stable boundaries that don't shift unpredictably
- Predictable topography: adding entries should deform basins gracefully, not catastrophically
You cannot ask these questions from the mechanical view alone. The landscape lens makes them visible.
Opening
The connection to bifurcation is direct: indexes are dynamical systems in disguise. Every query is a trajectory. Every result set is an attractor. Every index change is a potential bifurcation point where the topology reorganizes.
This doesn't replace the mechanical understanding. Scores still matter. Tokens still matter. But adding the geometric layer explains what mechanics cannot: why retrieval behaves the way it does, not just how.
And it opens new questions. What scoring functions produce well-formed basins? How do we detect watersheds in practice? How do we design indexes where attractors correspond to semantic clusters rather than lexical coincidences?
The landscape view doesn't answer these questions. It makes them possible to ask.