Skip to content

IVFPQSnapIndex

IVFPQSnapIndex

IVFPQSnapIndex(dim: int, nlist: int, M: int, K: int = 256, seed: int = 0, normalized: bool = False, use_rht: bool = False, keep_full_precision: bool = False, use_opq: bool = False)

Bases: FreezableIndex

Inverted-file + residual Product Quantization.

Parameters:

Name Type Description Default
dim int

Embedding dimension.

required
nlist int

Number of coarse clusters. Typical values: 4 · √N. Must be ≥ 2 and ≤ (training sample size).

required
M int

Number of PQ subspaces. Must divide dim (or pdim when use_rht=True).

required
K int

Centroids per subspace. 2 ≤ K ≤ 256.

256
seed int
0
normalized bool

When True, inputs are assumed unit-length and no per-vector norm is stored.

False
use_rht bool

Off by default — same rationale as PQSnapIndex.

False
use_opq bool

When True, learn an orthogonal OPQ-P rotation (Ge et al., 2013) during fit() and apply it to both corpus and queries before the coarse k-means and the residual PQ. Balances per-subspace variance, typically +0.5-2 pp recall at the same bytes/vec. Mutually exclusive with use_rht.

False

close

close() -> None

Release the lazy thread pool, if one was created.

Safe to call multiple times. Useful when an index is being torn down explicitly (e.g., long-lived workers cycling indices) — Python's GC will also reclaim the executor when the index goes out of scope, but explicit cleanup avoids worker threads lingering past their last useful query.

fit

fit(training_vectors: NDArray[float32], kmeans_iters: int = 15) -> None

Train coarse centroids and residual codebooks.

add_batch

add_batch(ids: list[Any], vectors: NDArray[float32]) -> None

Append a batch. Re-sorts the whole corpus by cluster id to preserve the contiguous layout — O(N) per call, so bulk-ingest before search is the intended pattern.

Encoding is chunked so peak transient memory stays bounded regardless of batch size. _id_to_row is rebuilt once at the end via numpy bulk operations rather than a Python dict comprehension, which avoids a O(N) interpreter pass at the end of large batches.

delete

delete(id: Any) -> bool

Remove a vector by id. O(n) — rebuilds the contiguous layout.

search

search(query: NDArray[float32], k: int = 10, nprobe: int | None = None, rerank_candidates: int | None = None, filter_ids: set[Any] | None = None) -> list[tuple[Any, float]]

Approximate top-k via IVF probing + residual PQ ADC.

Parameters:

Name Type Description Default
query NDArray[float32]
required
k int
10
nprobe int | None

Number of coarse clusters to visit.

``max(1, nlist // 16)``
rerank_candidates int | None

When set, the IVF-PQ pass returns the top-rerank_candidates instead of top-k, then those are re-scored against the stored full-precision vectors (kept as float16 since v0.7 to halve disk + RAM footprint; the rerank matmul itself still runs in float32 because q_pre is float32 and NumPy type-promotion widens the result), and the top-k of the reranked set is returned. Lifts recall toward the float32 brute-force ceiling at the cost of one (rerank_candidates, dim_eff) @ (dim_eff,) matmul per query — typically <1 ms even at large nprobe.

Requires keep_full_precision=True at index construction. Must be >= k.

None
filter_ids set | None

When provided, restrict results to ids in the set. The implementation is cluster- and pool-aware: the probe ranking is restricted to clusters that contain at least one filter row (so sparse filters skip clusters entirely), and the row-level mask is applied before the top-k / rerank-pool selection (so the rerank candidate pool is drawn from the filtered subset, not from the unfiltered probe output).

Unknown ids in filter_ids are silently dropped. An entirely-unknown filter returns []. A very sparse filter may need a larger nprobe to surface k hits.

None

search_batch

search_batch(queries: NDArray[float32], k: int = 10, nprobe: int | None = None, num_threads: int = 1, filter_ids: set[Any] | None = None) -> list[list[tuple[Any, float]]]

Approximate top-k for a batch of queries.

Throughput-oriented sibling of search(). Two things move:

  • Coarse probe + LUT build run as one BLAS call each across the whole batch instead of B per-query matmuls. At B = 128, M = 192, K = 256 this alone is ~5× faster than looping search().
  • Per-query gather + scoring can optionally fan out over num_threads worker threads. Unlike single-query threading (which competes with NumPy's internal BLAS pool), batch-level threading hands each thread a whole query's worth of work, so Python overhead is amortised over the query and the speedup actually shows up.

Parameters:

Name Type Description Default
queries NDArray[float32]

Shape (B, dim). Need not be normalized.

required
k as in ``search()``.
10
nprobe as in ``search()``.
10
num_threads int

Worker threads for per-query scoring. 1 is sequential (still benefits from the batched coarse + LUT). > 1 engages a lazily-created ThreadPoolExecutor. Validate against your laptop core count before going above 4.

1
filter_ids set | None

Optional id whitelist shared by every query in the batch (typical use: tenant / partition scoping). Same cluster- and pool-aware semantics as search(). The filter is resolved once per batch, not per query.

None

Returns:

Type Description
list of length B; each entry is the per-query top-k list of
``(id, score)`` pairs (same shape as ``search()``). Queries
with zero norm return ``[]`` for that slot.