Quickstart¶
Five-minute tour. Run examples/quickstart.py
end-to-end for a working script.
1. Build an index¶
SnapIndex is the simplest index: training-free scalar quantization on
top of the randomized Hadamard transform. No fit call needed.
import numpy as np
from snapvec import SnapIndex
rng = np.random.default_rng(0)
corpus = rng.standard_normal((10_000, 384)).astype(np.float32)
idx = SnapIndex(dim=384, bits=4, seed=0)
idx.add_batch(list(range(10_000)), corpus)
ids can be any hashable. They are serialized as strings by save(),
so only numeric-looking values (int, float) round-trip to their
original type; other values (UUIDs, tuples, arbitrary objects) come
back as their str() form. See Save and load
for the exact behavior.
2. Query¶
query = rng.standard_normal(384).astype(np.float32)
hits = idx.search(query, k=10)
for doc_id, score in hits:
print(doc_id, score)
search returns list[tuple[id, float]] sorted by descending score.
3. Persist¶
Writes are atomic (temp file + rename) and CRC32-checksummed.
Next steps¶
- Choosing an index -- decision tree across the four index types.
- User guide -- deep dives per index.
- Architecture -- how the compression works.