PQSnapIndex¶

PQSnapIndex ¶

PQSnapIndex(dim: int, M: int, K: int = 256, seed: int = 0, normalized: bool = False, use_rht: bool = False, use_opq: bool = False)

Bases: FreezableIndex

Product-quantization index trained once on a corpus sample.

Parameters:

Name	Type	Description	Default
`dim`	`int`	Embedding dimension.	required
`M`	`int`	Number of subspaces. `(pdim or dim)` must be divisible by `M`.	required
`K`	`int`	Centroids per subspace. Must satisfy `2 ≤ K ≤ 256`.	`256`
`seed`	`int`	Seed for the RHT (if used) and for k-means++ init.	`0`
`normalized`	`bool`	When True, inputs are assumed unit-length and no per-vector norm is stored.	`False`
`use_rht`	`bool`	When True, prepend the randomized Hadamard transform before splitting into subspaces. Off by default — on modern embeddings it hurts PQ by destroying subspace structure.	`False`
`use_opq`	`bool`	When True, learn an orthogonal OPQ-P rotation (Ge et al., 2013) during `fit()` and apply it to both corpus and queries. Balances per-subspace variance, typically lifting recall@10 by 0.5-2 pp at the same bytes/vec. Mutually exclusive with `use_rht`.	`False`

fit ¶

fit(training_vectors: NDArray[float32], kmeans_iters: int = 15) -> None

Train per-subspace codebooks on training_vectors.

Must be called exactly once, before the first add / add_batch. Calling fit a second time (whether or not any vectors have been indexed) raises — double-fit would silently overwrite the codebooks and, if any vectors had been indexed, invalidate their codes.