IVFPQSnapIndex¶

IVFPQSnapIndex ¶

IVFPQSnapIndex(dim: int, nlist: int, M: int, K: int = 256, seed: int = 0, normalized: bool = False, use_rht: bool = False, keep_full_precision: bool = False, use_opq: bool = False)

Bases: FreezableIndex

Inverted-file + residual Product Quantization.

Parameters:

Name	Type	Description	Default
`dim`	`int`	Embedding dimension.	required
`nlist`	`int`	Number of coarse clusters. Typical values: `4 · √N`. Must be ≥ 2 and ≤ (training sample size).	required
`M`	`int`	Number of PQ subspaces. Must divide `dim` (or `pdim` when `use_rht=True`).	required
`K`	`int`	Centroids per subspace. `2 ≤ K ≤ 256`.	`256`
`seed`	`int`		`0`
`normalized`	`bool`	When True, inputs are assumed unit-length and no per-vector norm is stored.	`False`
`use_rht`	`bool`	Off by default — same rationale as `PQSnapIndex`.	`False`
`use_opq`	`bool`	When True, learn an orthogonal OPQ-P rotation (Ge et al., 2013) during `fit()` and apply it to both corpus and queries before the coarse k-means and the residual PQ. Balances per-subspace variance, typically +0.5-2 pp recall at the same bytes/vec. Mutually exclusive with `use_rht`.	`False`

close ¶

close() -> None

Release the lazy thread pool, if one was created.

Safe to call multiple times. Useful when an index is being torn down explicitly (e.g., long-lived workers cycling indices) — Python's GC will also reclaim the executor when the index goes out of scope, but explicit cleanup avoids worker threads lingering past their last useful query.

fit ¶

fit(training_vectors: NDArray[float32], kmeans_iters: int = 15) -> None

Train coarse centroids and residual codebooks.

add_batch ¶

add_batch(ids: list[Any], vectors: NDArray[float32]) -> None

Append a batch. Re-sorts the whole corpus by cluster id to preserve the contiguous layout — O(N) per call, so bulk-ingest before search is the intended pattern.

Encoding is chunked so peak transient memory stays bounded regardless of batch size. _id_to_row is rebuilt once at the end via numpy bulk operations rather than a Python dict comprehension, which avoids a O(N) interpreter pass at the end of large batches.

delete ¶

delete(id: Any) -> bool

Remove a vector by id. O(n) — rebuilds the contiguous layout.

search ¶

search(query: NDArray[float32], k: int = 10, nprobe: int | None = None, rerank_candidates: int | None = None, filter_ids: set[Any] | None = None) -> list[tuple[Any, float]]

Approximate top-k via IVF probing + residual PQ ADC.

Parameters:

Name	Type	Description	Default
`query`	`NDArray[float32]`		required
`k`	`int`		`10`
`nprobe`	`int \| None`	Number of coarse clusters to visit.	``max(1, nlist // 16)``
`rerank_candidates`	`int \| None`	When set, the IVF-PQ pass returns the top-`rerank_candidates` instead of top-`k`, then those are re-scored against the stored full-precision vectors (kept as `float16` since v0.7 to halve disk + RAM footprint; the rerank matmul itself still runs in `float32` because `q_pre` is `float32` and NumPy type-promotion widens the result), and the top-`k` of the reranked set is returned. Lifts recall toward the float32 brute-force ceiling at the cost of one `(rerank_candidates, dim_eff) @ (dim_eff,)` matmul per query — typically <1 ms even at large nprobe. Requires `keep_full_precision=True` at index construction. Must be `>= k`.	`None`
`filter_ids`	`set \| None`	When provided, restrict results to ids in the set. The implementation is cluster- and pool-aware: the probe ranking is restricted to clusters that contain at least one filter row (so sparse filters skip clusters entirely), and the row-level mask is applied before the top-k / rerank-pool selection (so the rerank candidate pool is drawn from the filtered subset, not from the unfiltered probe output). Unknown ids in `filter_ids` are silently dropped. An entirely-unknown filter returns `[]`. A very sparse filter may need a larger `nprobe` to surface `k` hits.	`None`

search_batch ¶

search_batch(queries: NDArray[float32], k: int = 10, nprobe: int | None = None, num_threads: int = 1, filter_ids: set[Any] | None = None) -> list[list[tuple[Any, float]]]

Approximate top-k for a batch of queries.

Throughput-oriented sibling of search(). Two things move:

Coarse probe + LUT build run as one BLAS call each across the whole batch instead of B per-query matmuls. At B = 128, M = 192, K = 256 this alone is ~5× faster than looping search().
Per-query gather + scoring can optionally fan out over num_threads worker threads. Unlike single-query threading (which competes with NumPy's internal BLAS pool), batch-level threading hands each thread a whole query's worth of work, so Python overhead is amortised over the query and the speedup actually shows up.

Parameters:

Name	Type	Description	Default
`queries`	`NDArray[float32]`	Shape `(B, dim)`. Need not be normalized.	required
`k`	as in ``search()``.		`10`
`nprobe`	as in ``search()``.		`10`
`num_threads`	`int`	Worker threads for per-query scoring. `1` is sequential (still benefits from the batched coarse + LUT). `> 1` engages a lazily-created `ThreadPoolExecutor`. Validate against your laptop core count before going above 4.	`1`
`filter_ids`	`set \| None`	Optional id whitelist shared by every query in the batch (typical use: tenant / partition scoping). Same cluster- and pool-aware semantics as `search()`. The filter is resolved once per batch, not per query.	`None`

Returns:

Type	Description
`list of length B; each entry is the per-query top-k list of`
``(id, score)`` pairs (same shape as ``search()``). Queries
with zero norm return ``[]`` for that slot.