A sample goes through a sequence of stages:
1. DNA extraction
2. sequencing machine produces reads
3. reads are quality-checked
4. aligned to a reference genome
5. variants are called (where the sample differs from the reference)
6. variants are annotated and interpreted
7. a clinical report is generated
The file at each stage gets smaller and more meaningful:
FASTQ (raw, huge) → BAM (aligned) → VCF (variants) → annotated VCF →
human-readable report.
A RAG system almost always operates on the interpreted/report end, not the raw reads.