Skip to content

Quickstart

Five-minute tour. Run examples/quickstart.py end-to-end for a working script.

1. Build an index

SnapIndex is the simplest index: training-free scalar quantization on top of the randomized Hadamard transform. No fit call needed.

import numpy as np
from snapvec import SnapIndex

rng = np.random.default_rng(0)
corpus = rng.standard_normal((10_000, 384)).astype(np.float32)

idx = SnapIndex(dim=384, bits=4, seed=0)
idx.add_batch(list(range(10_000)), corpus)

ids can be any hashable. They are serialized as strings by save(), so only numeric-looking values (int, float) round-trip to their original type; other values (UUIDs, tuples, arbitrary objects) come back as their str() form. See Save and load for the exact behavior.

2. Query

query = rng.standard_normal(384).astype(np.float32)
hits = idx.search(query, k=10)

for doc_id, score in hits:
    print(doc_id, score)

search returns list[tuple[id, float]] sorted by descending score.

3. Persist

idx.save("my.snpv")
loaded = SnapIndex.load("my.snpv")

Writes are atomic (temp file + rename) and CRC32-checksummed.

Next steps