Retrieval-Augmented Generation grounds an LLM's answer in retrieved source text instead of relying on the model's parametric memory — so answers stay current, citable, and scoped to your data.
It runs in two phases: build-time and query-time. Build the index once per document, then answer many questions against it. (Build-time is sometimes called "offline" — confusing on mobile, because here it means done ahead of time, not without a network.)
Analogy: an open-book exam. Build-time is making your study guide before the test — read the book, rip it into one-idea notecards, label each so you can find it by meaning, and organize them in a box. Query-time is taking the test — a question appears, you flip to the few relevant cards, and write your answer citing them.
BUILD-TIME (once per document — make the study guide)
1. Ingestion — load the source documents
2. Chunking — split them into retrievable units (the notecards)
3. Embedding — turn each chunk into a vector so it's findable by meaning
4. Indexing — store the vectors for fast similarity search (the card box)
QUERY-TIME (once per question — take the test)
5. Retrieval — embed the query, find the top-k most similar chunks (optionally re-rank)
6. Generation — put the retrieved chunks + the query in the prompt; the LLM answers, ideally with citations
ACROSS BOTH
7. Evaluation / observability — measure faithfulness and retrieval quality, log everything
Build-time (1–4) happens ahead of time, once per document; query-time (5–6) runs live, on every question; 7 wraps both, so you can tell a retrieval failure (the right chunk never surfaced) from a generation failure (the model mishandled a chunk it was given).
Build-time vs query-time is about WHEN a stage runs, not WHERE. Where each stage runs — on the device or on a server — is a separate decision (see On-Device & Mobile). On the genomics app, build-time and retrieval both run on the phone so the raw genome never leaves it; only generation might call out, sending just the question plus the retrieved snippets.
Genomics example: ingestion is the VCF-derived clinical report, retrieval finds the relevant variant/annotation sections, and generation answers the patient's question citing those sections.