IVFPQSnapIndex¶
Sub-linear search at scale, with optional float16 rerank. Inverted-file
coarse partition on top of residual PQ: each query visits only
nprobe / nlist of the corpus. With keep_full_precision=True +
rerank_candidates, the top candidates are re-scored against the stored
float16 vectors, which recovers almost all the recall lost to PQ.
Headline numbers¶
On BGE-small / FIQA (N = 57,638, dim = 384):
- recall@10 = 0.977 at 441 us / query
- recall@10 = 0.998 at 1021 us / query
5.8x faster than v0.6 at identical recall, past the PQ-only 0.929 ceiling.
When to use¶
- N >= 100k -- below that, full-scan
PQSnapIndexis comparable and simpler. - Latency budget in the sub-millisecond range.
- Recall target >= 0.97.
Basic usage¶
import numpy as np
from snapvec import IVFPQSnapIndex
corpus = np.random.randn(100_000, 384).astype(np.float32)
query = np.random.randn(384).astype(np.float32)
idx = IVFPQSnapIndex(
dim=384,
nlist=512, # 4 * sqrt(N)
M=16,
K=256,
keep_full_precision=True,
seed=0,
)
idx.fit(corpus[:20_000])
idx.add_batch(list(range(100_000)), corpus)
hits = idx.search(query, k=10, nprobe=32, rerank_candidates=100)
Sizing¶
| Parameter | Guidance |
|---|---|
nlist |
4 * sqrt(N); clamp between 32 and 65536 |
nprobe |
Start at nlist // 16; sweep to tune recall vs latency |
| Training set size | >= 30 * nlist rows (FAISS rule of thumb) |
rerank_candidates |
None for PQ-only; 100 for strong recall lift |
keep_full_precision |
True to enable rerank_candidates |
Operating points (FIQA, BGE-small, N=57k)¶
nprobe |
rerank_candidates |
recall@10 | latency |
|---|---|---|---|
| 8 | None | 0.85 | 180 us |
| 32 | None | 0.92 | 340 us |
| 64 | 100 | 0.977 | 441 us |
| 256 | 200 | 0.998 | 1021 us |
See benchmarks for the full sweep and reproduction instructions.
OPQ rotation¶
use_opq=True turns on a learned orthogonal rotation during fit()
that balances per-subspace variance, improving recall at the same
bytes/vec. Cost is one eigendecomposition of the (dim, dim)
covariance (~50 ms and ~270 MB peak on N=57k training samples at
dim=384). Runtime cost per query is one extra (1, dim) @ (dim, dim)
matmul (~2 us), invisible against the rest of the search pipeline.
idx = IVFPQSnapIndex(
dim=384, nlist=512, M=48, K=256,
normalized=True, use_opq=True, seed=0,
)
idx.fit(train_sample)
idx.add_batch(ids, corpus)
OPQ pays off when subspace dim (dim / M) is at least 4. At
M = dim / 2 the subspaces have no room to redistribute variance
and the recall gain collapses. See the benchmarks
page for the measured recall-vs-M table on BEIR FIQA.
use_opq is mutually exclusive with use_rht: both are rotations,
OPQ learned from the data, RHT a fixed random one. Pick one.
File format¶
On-disk extension: .snpi. Magic SNPI, v4 as of v0.9.0 (adds
float16 rerank cache).