IVFPQSnapIndex¶

Sub-linear search at scale, with optional float16 rerank. Inverted-file coarse partition on top of residual PQ: each query visits only nprobe / nlist of the corpus. With keep_full_precision=True + rerank_candidates, the top candidates are re-scored against the stored float16 vectors, which recovers almost all the recall lost to PQ.

Headline numbers¶

On BGE-small / FIQA (N = 57,638, dim = 384):

recall@10 = 0.977 at 441 us / query
recall@10 = 0.998 at 1021 us / query

5.8x faster than v0.6 at identical recall, past the PQ-only 0.929 ceiling.

When to use¶

N >= 100k -- below that, full-scan PQSnapIndex is comparable and simpler.
Latency budget in the sub-millisecond range.
Recall target >= 0.97.

Basic usage¶

import numpy as np
from snapvec import IVFPQSnapIndex

corpus = np.random.randn(100_000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)

idx = IVFPQSnapIndex(
    dim=384,
    nlist=512,          # 4 * sqrt(N)
    M=16,
    K=256,
    keep_full_precision=True,
    seed=0,
)
idx.fit(corpus[:20_000])
idx.add_batch(list(range(100_000)), corpus)

hits = idx.search(query, k=10, nprobe=32, rerank_candidates=100)

Sizing¶

Parameter	Guidance
`nlist`	`4 * sqrt(N)`; clamp between 32 and 65536
`nprobe`	Start at `nlist // 16`; sweep to tune recall vs latency
Training set size	`>= 30 * nlist` rows (FAISS rule of thumb)
`rerank_candidates`	`None` for PQ-only; `100` for strong recall lift
`keep_full_precision`	`True` to enable `rerank_candidates`

Operating points (FIQA, BGE-small, N=57k)¶

`nprobe`	`rerank_candidates`	recall@10	latency
8	None	0.85	180 us
32	None	0.92	340 us
64	100	0.977	441 us
256	200	0.998	1021 us

See benchmarks for the full sweep and reproduction instructions.

OPQ rotation¶

use_opq=True turns on a learned orthogonal rotation during fit() that balances per-subspace variance, improving recall at the same bytes/vec. Cost is one eigendecomposition of the (dim, dim) covariance (~50 ms and ~270 MB peak on N=57k training samples at dim=384). Runtime cost per query is one extra (1, dim) @ (dim, dim) matmul (~2 us), invisible against the rest of the search pipeline.

idx = IVFPQSnapIndex(
    dim=384, nlist=512, M=48, K=256,
    normalized=True, use_opq=True, seed=0,
)
idx.fit(train_sample)
idx.add_batch(ids, corpus)

OPQ pays off when subspace dim (dim / M) is at least 4. At M = dim / 2 the subspaces have no room to redistribute variance and the recall gain collapses. See the benchmarks page for the measured recall-vs-M table on BEIR FIQA.

use_opq is mutually exclusive with use_rht: both are rotations, OPQ learned from the data, RHT a fixed random one. Pick one.

File format¶

On-disk extension: .snpi. Magic SNPI, v4 as of v0.9.0 (adds float16 rerank cache).

API¶

See IVFPQSnapIndex API reference.