Embeddings and similarity

An embedding model maps text to a dense vector so that semantically similar text lands nearby. Retrieval scores candidates by cosine similarity (or dot product) between the query vector and the chunk vectors.

Choose an embedding model by: Quality — MTEB benchmark Dimension — smaller means a cheaper index and faster search Domain fit — a biomedical-tuned model can beat a general one on genomics jargon Size/latency — on mobile, this is decisive Two invariants: Normalize vectors so cosine == dot product The embedding model used at index time and at query time must match