Vector databases and ANN search

A vector store holds the chunk embeddings and answers nearest-neighbor queries. Exact search is O(N) per query; Approximate Nearest Neighbor (ANN) indexes — HNSW graphs, IVF/PQ — make it sub-linear, trading a little recall for big speed and memory savings.

Key knobs: HNSW — M and efSearch (recall vs. latency) IVF — number of lists / probes Where it runs: Server — FAISS, pgvector, Pinecone, Weaviate On-device — sqlite-vec or a small flat index Index size is roughly num_chunks × dimension × bytes_per_float — a reason to chunk the report, not the raw genome.