If sequencing technology can read all 3.2 billion base pairs of the human genome, why would we choose to sequence only part of it? It’s a bit like asking: if you’re looking for a lost key in your house, why not search every room, every drawer, every pocket? The answer is simple: time, cost, and practicality.
In genomics, we face a similar trade-off. You can sequence:
The choice between these approaches has shaped how we discover disease genes, diagnose patients, and conduct research. Let’s understand what each approach offers and when to use which one.
Before we compare WGS and WES, we need to understand what the exome is and why it’s special.
Your genome contains about 20,000 protein-coding genes. But genes aren’t continuous stretches of DNA. Instead, each gene is split into:
When cells make proteins, they transcribe the entire gene into RNA, then cut out the introns and splice the exons together. The final messenger RNA (mRNA) contains only exonic sequence, which is then translated into protein.
The collection of all exons in the genome is called the exome. It makes up only 1-2% of your total DNA about 30-50 million base pairs out of 3.2 billion. But here’s why it’s important:
About 85% of known disease-causing mutations occur in exons.
Why? Because exons directly code for proteins. A single base change in an exon can swap one amino acid for another (missense), create a premature stop signal (nonsense), or shift the reading frame (frameshift). Any of these can break a protein’s function, potentially causing disease. Changes in introns or intergenic regions can affect gene regulation or RNA splicing, but they’re less likely to have dramatic effects.
This high concentration of disease variants in exons created an opportunity: if most pathogenic variants are in just 1-2% of the genome, we could focus our sequencing there.
Before diving into details, let’s see how these approaches compare:

Figure 6-1. WES vs WGS: What Do You See? WES targets only the coding regions (exons), capturing about 1-2% of the genome where most disease-causing variants reside. WGS sequences the entire genome, including exons, introns, and regulatory regions, providing complete coverage but at higher cost and complexity.
| Feature | Whole-Exome Sequencing (WES) | Whole-Genome Sequencing (WGS) |
|---|---|---|
| Coverage | ~1-2% of genome (exons only) | 100% of genome |
| Target size | 30-50 million bases | 3.2 billion bases |
| Data per sample | ~6 GB | ~90-100 GB |
| Cost (2024) | ~$400-500 | ~$600-1,000 |
| Typical depth | 100-150X | 30-40X |
| SNVs/indels in exons | ✓ Excellent | ✓ Excellent |
| Structural variants | ✗ Mostly missed | ✓ Detected |
| Non-coding variants | ✗ Missed | ✓ Detected |
| Regulatory regions | ✗ Missed | ✓ Detected |
| Repeat expansions | ✗ Missed | ✓ Detected (especially with long reads) |
| Analysis complexity | Lower fewer variants | Higher millions of variants |
| Interpretation | Easier focus on coding | Harder non-coding interpretation uncertain |
| Diagnostic yield (rare disease) | 25-50% | 30-55% (slightly higher) |
| Best for | Known coding disorders, cost-sensitive studies | Complex cases, structural variants, discovery |
WES doesn’t actually sequence only exons that would be technically challenging. Instead, it uses a clever trick called target capture or enrichment:
Think of it like using a magnet to pick out just the metal pieces from a mixed pile of materials.

Figure 6-2. WES Process with Target Capture. After DNA extraction and fragmentation, biotinylated probes (baits) bind specifically to exonic sequences. Streptavidin-coated magnetic beads capture these probe-bound fragments, physically separating exons from introns and intergenic regions. This targeted enrichment concentrates sequencing effort on the protein-coding regions. Source: Microbe Notes

Figure 6-3. Why Target Capture Works and Why It Fails. Top: When exons are intact, capture probes bind efficiently and generate high-depth sequencing data. Bottom: Structural variants like deletions or duplications prevent probe binding, leaving these regions undetected in WES data. This fundamental limitation explains why WES misses 5-10% of pathogenic variants that WGS can detect.
Commercial capture kits from companies like Agilent SureSelect, Illumina TruSeq, and Twist Bioscience each target slightly different exon sets with varying capture efficiencies. Not all exons capture equally well regions with extreme GC content, repetitive sequences, or complex secondary structures often have coverage gaps.
WES made its breakthrough in 2010 with a paper that became a model for the approach. Researchers were studying four siblings with Miller syndrome a rare developmental disorder causing facial abnormalities. Traditional gene-hunting methods had failed.
The strategy was simple but revolutionary:
They found variants in a gene called DHODH that none of the healthy family members carried. DHODH had never been linked to any human disease before. This discovery took months instead of years and cost thousands instead of millions.
This paper launched the WES era. Suddenly, researchers could identify disease genes for rare disorders by sequencing just a few affected individuals.
WES quickly became the first-line genetic test for diagnosing rare diseases, with a diagnostic yield of 25-50% for patients with suspected genetic disorders, WES identifies the genetic cause in roughly one-third to half of cases. This is remarkable considering these patients often underwent years of inconclusive testing.
When WES works well:
When WES comes up short:
WGS is conceptually simpler than WES you just sequence everything:
No target capture step. No enrichment. Just sequence the whole thing.

Figure 6-4. Whole-Genome Sequencing (WGS) Workflow. WGS captures all genomic regions coding exons, introns, regulatory elements, and intergenic sequences without the selective enrichment step used in WES. This comprehensive approach enables detection of all variant types. Source: Microbe Notes
Structural variants: A patient with developmental delay and seizures tested negative on WES. WGS revealed a large deletion removing three exons of a neurodevelopmental gene. WES missed this because target capture requires intact DNA when exons are deleted, there’s nothing for the baits to capture. WGS detected the deletion through absence of reads in that region. This happens in about 5-10% of cases where WES fails.
Regulatory variants: Some forms of beta-thalassemia (reduced hemoglobin production) are caused by mutations in the regulatory regions of the HBB gene, not in the coding sequence. These would be missed by WES but are detectable with WGS.
Deep intronic variants: While WES captures exon-intron boundaries, deep intronic variants can create new splice sites or destroy existing ones. These “cryptic splice variants” can cause disease but lie beyond WES’s reach.
WGS opens up the study of non-coding variants, which make up 98% of the genome. While most non-coding sequence is functionally neutral, some regions are critically important. Enhancers and promoters control when and where genes are expressed a mutation in an enhancer can affect a gene hundreds of thousands of bases away, causing disease without touching the gene itself.
The challenge: We’re still learning which non-coding variants matter. The human genome contains millions of non-coding variants, and distinguishing functional from neutral ones remains difficult.
Whether you’re sequencing a whole genome or just the exome, the basic workflow is similar with one key difference for WES: target capture to isolate exons. This section walks through the laboratory and computational steps that transform a blood sample into a list of genetic variants.

Figure 6-5. From Reads to Variants: The Computational Pipeline. After sequencing generates millions of reads, a series of computational steps transforms raw data into clinically actionable variants: quality control removes low-quality bases, alignment maps reads to the reference genome, variant calling identifies differences, filtering removes artifacts, and annotation adds biological context.
Library preparation is the process of converting your DNA sample into a form that sequencing machines can read. The workflow differs significantly between short-read platforms (Illumina) and long-read platforms (PacBio, Nanopore), each optimized for their specific sequencing chemistry.
Illumina library prep is the most common approach for both WGS and WES, optimized for generating billions of short, highly accurate reads.
📺 Video Resource: For a detailed walkthrough of Illumina library preparation, watch this Illumina expert tutorial which covers best practices for Nextera library prep.
Overview of the Workflow:
1. DNA Extraction and Quality Control
2. Fragmentation
3. End Repair and Adapter Ligation
4. PCR Amplification
5. Target Capture (WES Only!)
6. Quality Control Check
Key Advantages: High accuracy (>99.9%), massive throughput (billions of reads), well-established protocols
Limitations: Short reads struggle with repetitive regions and structural variants
PacBio library prep creates circular DNA templates that enable multiple reads of the same molecule for high-accuracy long reads.
📺 Video Resource: For PacBio and Nanopore library preparation workflows, watch this comparative tutorial.
Overview of the Workflow:
1. DNA Extraction and Quality Control
2. Fragmentation (Gentle!)
3. End Repair and Hairpin Adapter Ligation
4. No PCR Amplification (Usually!)
5. Quality Control Check
The HiFi Advantage: The polymerase circles the SMRTbell template 10-20 times, reading the same DNA sequence repeatedly. A consensus is computed from these multiple passes, achieving >99.9% accuracy despite long read lengths.
Key Advantages: Long reads (10-20 kb), high accuracy, no PCR bias, detects DNA modifications
Limitations: Lower throughput than Illumina, higher cost per base
Nanopore library prep is the simplest and fastest, enabling ultra-long reads and even portable sequencing.
📺 Video Resource: See the comparative tutorial mentioned above for Nanopore workflows.
Overview of the Workflow:
1. DNA Extraction and Quality Control
2. Minimal or No Fragmentation
3. Adapter Ligation
4. No PCR Amplification (PCR-Free)
5. Minimal Quality Control
The Nanopore Advantage: Simplest library prep, real-time sequencing (watch data generation live), ultra-long reads (>100 kb, sometimes >1 Mb), portable devices (MinION is USB-sized).
Key Advantages: Ultra-long reads, fastest library prep, portable sequencing, real-time results, native DNA sequencing
Limitations: Higher per-base error rate than Illumina/PacBio (though improving rapidly), lower throughput per run
| Feature | Illumina | PacBio HiFi | Oxford Nanopore |
|---|---|---|---|
| Library Prep Time | 4-8 hours | 3-6 hours | 10 min - 2 hours |
| Read Length | 150-300 bp | 10-20 kb | 10-100+ kb |
| Accuracy | >99.9% | >99.9% | 95-99% |
| PCR Amplification | Yes (usually) | No (usually) | No |
| Best For | WES, high-throughput WGS | Structural variants, phasing | Ultra-long reads, genome assembly |
| Fragmentation | Required | Controlled | Minimal/none |
| Special Feature | Target capture for WES | SMRTbell circular templates | Motor proteins, portability |
Bottom line: Choose your platform based on your scientific question:
The library is loaded onto a flow cell a glass slide with millions of tiny spots where sequencing happens.
Cluster generation: Library fragments bind to oligonucleotides on the flow cell surface, then through “bridge amplification” create dense clusters of ~1,000 identical copies. Single molecules don’t produce enough signal to detect clusters amplify the signal.
Sequencing by Synthesis (SBS):
This produces millions of short reads simultaneously typically 2×150 bp (paired-end sequencing, reading both ends of each fragment).
Typical output:
PacBio uses SMRT Cells containing 25 million tiny wells called zero-mode waveguides (ZMWs). Each well holds a single DNA polymerase molecule.
Single-Molecule Real-Time (SMRT) sequencing:
Read lengths: 15,000-20,000 bp average, with some exceeding 100,000 bp. These long, accurate reads can span entire genes, resolve structural variants, phase variants, and sequence through repetitive regions.
Nanopore sequencing threads DNA through protein pores in a membrane. As bases pass through, they disrupt an electrical current in characteristic ways, enabling ultra-long reads (>100,000 bp, sometimes >1 million bp), real-time results, and portable devices like the USB-sized MinION.
After sequencing, you have millions or billions of reads. Now comes the computational challenge: figuring out what your genome looks like and how it differs from the reference.
Each base call comes with a quality score (Q score):
FastQC checks per-base quality, sequence duplication levels, adapter contamination, and GC content distribution. Low-quality bases (typically at read ends) and adapter sequences are trimmed.
Your reads need to be mapped back to their original genomic positions.
Alignment tools:
The aligner compares each read to the reference genome (e.g., GRCh38 or T2T-CHM13), finds the best-matching location, and produces a BAM file (Binary Alignment Map).
Challenges: Repetitive regions create ambiguity for short reads. Structural variants can cause incorrect alignment or prevent alignment entirely.
PCR created many identical copies of some original DNA molecules. Tools like Picard MarkDuplicates identify and mark duplicates based on identical mapping positions and sequences. Duplicates inflate coverage artificially and can bias variant calling.
Sophisticated software tools use statistical models to distinguish real variants from sequencing errors.
Popular tools:
How it works: For each genome position, the variant caller stacks up all covering reads, counts bases, calculates the likelihood of a real variant versus sequencing error, considers quality scores, checks depth (20-30+ reads provide confidence), and applies statistical filters.
Example:
Output: A VCF file (Variant Call Format) listing all detected variants with position, reference/alternate bases, quality score, genotype, and read depth.
Initial variant calls contain false positives. Filtering removes artifacts based on:
Result: A high-confidence variant set thousands to millions of variants depending on WGS versus WES.
Annotation tools (VEP, ANNOVAR, SnpEff) add biological context:
Genomic location: Is this in a gene? Which one? In an exon, intron, or intergenic region?
Functional impact:
Population frequency: How common is this in databases like gnomAD? Common variants (>1%) are usually benign.
Clinical significance: Is it in ClinVar? What’s the classification (pathogenic, benign, VUS)?
Prediction scores: CADD, REVEL scores indicate likely deleteriousness and pathogenicity.
Output: An annotated VCF file, often converted to a spreadsheet for easier interpretation.
Day 1-3: DNA extraction and library preparation (WES includes exome capture)
Day 4-5: Sequencing on NovaSeq 6000, generating ~80 million read pairs for WES
Day 6-7: Bioinformatics QC, alignment, duplicate marking, variant calling and filtering
Day 8-10: Interpretation filter for rare variants, focus on genes related to symptoms, prioritize high-impact variants, check ClinVar, validate with Sanger sequencing
Result: Genetic diagnosis made in ~10 days, compared to months or years with older approaches.

Figure 6-6. Clinical Decision Tree: WES or WGS? This flowchart guides clinical decision-making for genetic testing. Start with WES for suspected Mendelian disorders with known coding variants. Move to WGS when WES is negative, structural variants are suspected, or the phenotype is complex or atypical.
Patient: 8-year-old boy with intellectual disability, autism, and dysmorphic facial features
Testing strategy:
This deletion was missed by WES (no exons to capture) and too small for microarray. WGS provided the answer after other methods failed.
The cost gap between WES and WGS is shrinking rapidly:
| Year | WES Cost | WGS Cost | Difference |
|---|---|---|---|
| 2010 | ~$5,000 | ~$50,000 | 10X |
| 2015 | ~$1,000 | ~$5,000 | 5X |
| 2020 | ~$500 | ~$1,000 | 2X |
| 2024 | ~$400-500 | ~$600-1,000 | 1.5X |
As costs converge, the argument for WES weakens. Soon, WGS may cost the same as WES.
Storage needs:
For large biobanks, this 15X difference matters. The UK Biobank sequenced 500,000 genomes at 90 GB each, that’s 45 petabytes requiring serious infrastructure.
Sequencing is fast. Interpretation is slow.
As one geneticist observed: “We went from being starved for data to drowning in it.”
There’s an interesting argument emerging: WES might be a transitional technology.
You can do WGS but initially analyze only the exonic regions getting WES-equivalent results while keeping the full dataset for later. This gives you:
WES still has advantages:
Most WGS and WES today uses short-read sequencing (Illumina), which struggles with repetitive regions, structural variants, and phasing.
Long-read sequencing (PacBio HiFi, Oxford Nanopore) is changing this with reads of 15,000-20,000 bp or >100,000 bp that can:
The T2T-CHM13 genome (the first complete human genome) was built using long reads. As long-read WGS becomes more affordable, it will likely replace short-read approaches.
Whole-Exome Sequencing:
Whole-Genome Sequencing:
The trend is clear: We’re moving toward universal WGS. For now, both approaches have their place, and the choice depends on your question, resources, and clinical context. The key insight: WES isn’t simply a subset of WGS in practice it’s a different experimental design with distinct strengths and limitations. Understanding when to use each approach is essential for modern geneticists and clinicians.