PQSnapIndex¶
PQSnapIndex ¶
PQSnapIndex(dim: int, M: int, K: int = 256, seed: int = 0, normalized: bool = False, use_rht: bool = False, use_opq: bool = False)
Bases: FreezableIndex
Product-quantization index trained once on a corpus sample.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dim
|
int
|
Embedding dimension. |
required |
M
|
int
|
Number of subspaces. |
required |
K
|
int
|
Centroids per subspace. Must satisfy |
256
|
seed
|
int
|
Seed for the RHT (if used) and for k-means++ init. |
0
|
normalized
|
bool
|
When True, inputs are assumed unit-length and no per-vector norm is stored. |
False
|
use_rht
|
bool
|
When True, prepend the randomized Hadamard transform before splitting into subspaces. Off by default — on modern embeddings it hurts PQ by destroying subspace structure. |
False
|
use_opq
|
bool
|
When True, learn an orthogonal OPQ-P rotation (Ge et al.,
2013) during |
False
|
fit ¶
Train per-subspace codebooks on training_vectors.
Must be called exactly once, before the first add /
add_batch. Calling fit a second time (whether or not
any vectors have been indexed) raises — double-fit would
silently overwrite the codebooks and, if any vectors had been
indexed, invalidate their codes.