Genomic data scale and why it matters

A human genome is ~3.2 billion base pairs; a raw WGS FASTQ is tens to hundreds of GB; a VCF is tens of MB; the prioritized clinical variants are dozens to hundreds.

This funnel is the core engineering insight for mobile RAG: you cannot and should not put the raw genome in a vector index on a phone. You retrieve over the small, meaningful end of the funnel.

If asked "how do you index millions of variants on a phone," the answer is "you don't — you index the clinically prioritized/annotated subset and the report."