In 1975, Mary-Claire King and Allan Wilson published a paper that would quietly reshape how scientists think about the genome. Working with protein sequences from humans and chimpanzees, they found something that startled the field: the two species were, at the protein level, approximately 99% identical. Chimpanzees and humans share roughly the same hemoglobin, the same cytochrome c, the same albumin. By the molecular measure of their proteins, they are barely distinguishable.
Yet the differences between a human and a chimpanzee — in cognition, in language, in anatomy, in the arc of development from embryo to adult — are not subtle. Something had to explain the gap between molecular similarity and biological divergence. King and Wilson proposed that the answer lay not in the proteins themselves but in how and when those proteins are expressed — in the regulatory sequences that control gene activity across development. The genes were the same; the instructions for using them were different.
This paper planted a seed for a powerful idea that would take decades to fully germinate: if a sequence is identical across species separated by millions of years of evolution, it must be doing something critically important. Natural selection is conservative. Change costs something. Mutations that break essential functions disappear because organisms carrying them fail to reproduce. The sequences that persist unchanged across fish, mice, dogs, and humans do so because evolution has had millions of opportunities to alter them, and every time it tried, the result was worse. Unchanged means indispensable.
The converse also holds: sequences that vary freely across species are probably tolerant of change. They can be modified without consequence, and so they accumulate differences over time. This asymmetry — conservation as a signal of function, variability as a signal of flexibility — is the foundation of everything in this chapter. It gives us, without a single additional experiment, a way to read the functional importance of any nucleotide directly from the record of life’s history.
The scale of modern genomics creates a fundamental problem: we can sequence genomes faster than we can understand them. A single whole-genome sequence generates millions of variants. Even after filtering by frequency and focusing on coding regions, we’re left with thousands of candidate variants that could potentially cause disorders.
We can’t functionally test every variant. Here’s why:
Time constraints: Functional experiments take weeks to months per variant. A patient with an undiagnosed disorder needs answers in weeks or months, not years.
Cost barriers: Each functional assay costs $2,000-10,000 depending on complexity. Testing thousands of variants would cost millions of dollars per patient—completely impractical for clinical diagnostics.
Technical limitations: Many human proteins can’t be easily studied in cell culture or model organisms. Some require specific developmental contexts, cell types, or tissue environments that are difficult or impossible to recreate experimentally.
Phenotype complexity: Even when we can test a variant, interpreting the results isn’t straightforward. A protein might retain 80% of its normal function—is that enough? Does partial loss of function cause a disorder? The answer depends on the gene and the biological context.
But here’s the key insight that makes variant prioritization possible: evolution has already done millions of experiments for us.
Every species alive today represents a lineage that has survived millions of years of natural selection. Mutations that break critical biological functions tend to disappear because organisms carrying them don’t survive to reproduce. Meanwhile, mutations in less important regions—or neutral changes that don’t affect function—accumulate freely over time.
This means that if we compare a DNA sequence across many species and find that a particular nucleotide is identical (or nearly identical) in humans, chimpanzees, mice, dogs, chickens, and even fish, that nucleotide is probably doing something important. Any mutation at that position likely has functional consequences.
Conversely, if a nucleotide varies freely across species, mutations there probably don’t matter much.
This is the foundation of conservation-based variant prediction: we use evolutionary patterns—the signature of natural selection acting over millions of years—to predict which variants are likely to have functional impact without doing direct experiments.
In this chapter, you’ll learn:
By the end, you’ll understand how Dr. Chen can narrow down 2,500 candidates to perhaps 10-20 high-priority variants that warrant further investigation—making the impossible problem solvable.
By the end of this chapter, you will be able to:
Before we dive into specific tools and scores, let’s make sure the underlying logic is crystal clear.
Imagine you’re an archaeologist studying ancient texts. You find ten copies of the same manuscript from different time periods, written by scribes who copied from earlier versions. Most words vary between copies—the scribes made mistakes, used different spelling, or updated old language. But a few specific phrases appear identically in all ten manuscripts.
What does that tell you? Those identical phrases are probably important—maybe they’re religious invocations, legal terms, or critical instructions that scribes were trained never to alter. The consistency across copies, despite errors elsewhere, signals importance.
Evolution works similarly, but instead of manuscripts, we’re comparing DNA sequences across species. Instead of scribes, we have mutation and natural selection. And instead of centuries, we’re looking across millions of years.
DNA mutates constantly. In humans, each newborn carries about 70 new mutations that neither parent had. Multiply that across millions of individuals and billions of years, and you get an enormous amount of genetic variation.
But not all mutations persist. Natural selection acts as a filter:
For critical sequences: Mutations that break important functions reduce survival or reproduction. Organisms carrying these mutations are less likely to pass them on. Over time, such mutations disappear from the population. The result? These sequences stay nearly identical across millions of years and across many species.
For non-critical sequences: Mutations that don’t affect function—or affect it only mildly—accumulate freely. These sequences drift and diverge between species. After millions of years, they might be completely different.
This creates a clear pattern: high conservation = functional importance.
Let’s look at a real gene: INS, which encodes insulin. Insulin is critical for regulating blood sugar. Without functional insulin, you get diabetes.
If we compare the insulin protein sequence across species, we find:
Even after 450 million years of evolution separating humans from fish, more than half of insulin’s sequence remains unchanged. Why? Because mutations in insulin are often harmful—they cause diabetes or are lethal during development. Natural selection removes them.
Now compare this to a less critical gene—say, one involved in hair color or scent detection. Those genes vary much more freely between species because mutations there don’t usually affect survival.
When we find a new variant in a patient, we can ask: Does this variant occur at a highly conserved position?
Variant at a conserved position: This nucleotide has been preserved for millions of years. Changing it likely disrupts function. The variant is more likely to have functional impact.
Variant at a non-conserved position: This nucleotide varies naturally between species. Changing it is probably harmless. The variant is less likely to cause problems.
This doesn’t give us absolute certainty—conservation is a probability, not a guarantee—but it dramatically narrows our search space.
We can measure conservation at different scales:
Nucleotide level: Is this specific DNA base conserved across species?
Amino acid level: Is this position in the protein sequence conserved?
Structural level: Does the 3D structure of the protein remain similar?
In the next sections, we’ll explore how each of these levels of conservation is measured and applied to variant interpretation.
Now let’s get specific: how do we actually measure whether a DNA position is conserved?
The process involves three steps:
First, we need to compare the same genomic region across many species. This requires a multiple sequence alignment (MSA)—lining up DNA sequences so we can see which positions match and which differ.
Here’s a simple example. Let’s say we’re looking at a short stretch of DNA around a specific position:
Human: ATGCGATCGAGTC
Chimp: ATGCGATCGAGTC
Mouse: ATGCGCTCGAGTC
Dog: ATGCGATCGAGTC
Chicken: ATGCCATCGAGTC
Zebrafish: ATGCCATAGAGTC
|||| || ||||
Conservation pattern
At position 6, we see:
Most species have A, but mouse has C. This position is highly conserved, meaning mutations here are rare and probably disrupt function.
Now look at position 5:
This position varies more—mammals have G, but birds and fish have C. This suggests the position is less conserved and more tolerant of variation.
Real alignments use dozens to hundreds of species and sophisticated algorithms to handle insertions, deletions, and complex evolutionary relationships. Projects like the Zoonomia Project (241 mammal genomes) and the Vertebrate Genomes Project provide the data for these alignments.
Once we have an alignment, we calculate conservation scores. Several different scoring methods exist, each with slightly different approaches:
What it measures: PhyloP asks, “Is this position evolving slower or faster than expected by chance?”
How it works: PhyloP uses a statistical model of neutral evolution (no selection). It compares the observed rate of change at each position to what we’d expect under neutrality.
Score interpretation:
Typical ranges:
Example: A variant at position chr7:117,548,401 has a PhyloP score of 6.2. This means this position is in the top 0.1% most conserved positions in the genome. A mutation here is very likely to have functional consequences.
What it measures: PhastCons identifies conserved elements—stretches of DNA (not just single positions) that are conserved as a block.
How it works: Instead of scoring each position independently, PhastCons looks for regions where many adjacent positions are all conserved. It uses a hidden Markov model to segment the genome into conserved and non-conserved elements.
Score interpretation:
Why it’s useful: PhastCons is better at identifying regulatory elements (enhancers, promoters) because these often consist of multiple conserved transcription factor binding sites clustered together.
Example: A variant falls in a region with PhastCons score 0.95, meaning there’s a 95% probability this region is under selective constraint. Even if the specific nucleotide isn’t perfectly conserved, the region as a whole matters.
What it measures: GERP++ calculates “rejected substitutions”—how many mutations evolution has rejected at this position compared to neutral expectation.
How it works:
Score interpretation:
Why it’s useful: GERP scores are intuitive—the score literally tells you how many mutations have been “rejected” by natural selection.
Example: A position has GERP++ score of 5.8. This means ~6 mutations that should have occurred by chance were removed by selection. This position is critical.
All three scores measure conservation, but with different approaches:
| Score | Measures | Best For | Interpretation |
|---|---|---|---|
| PhyloP | Single-nucleotide conservation | Prioritizing individual variants | Positive = conserved |
| PhastCons | Conserved elements/regions | Finding regulatory regions | 0-1 probability scale |
| GERP++ | Rejected substitutions | Understanding selection strength | Number of rejected mutations |
In practice, you’ll often see all three scores reported for a variant. They usually agree—if one says “conserved,” the others do too—but occasionally they differ, which can provide nuance.
Here’s how Dr. Chen might use conservation scores to prioritize her 2,500 rare coding variants:
Step 1: Filter by conservation
Step 2: Combine with other information
Step 3: Focus on top candidates
This workflow doesn’t give definitive answers, but it makes the problem tractable—narrowing 2,500 candidates to maybe 20-30 that warrant detailed investigation.
Conservation scores like PhyloP and GERP work at the DNA level—they tell us if a nucleotide position is conserved. But for missense variants (variants that change one amino acid to another), we want to know specifically: Does this amino acid change disrupt protein function?
This requires protein-level analysis. The two most widely used tools are SIFT and PolyPhen-2.
A missense variant changes one amino acid in a protein to a different amino acid. For example:
Normal: ...Leucine - Proline - Glycine - Alanine...
Variant: ...Leucine - Proline - Arginine - Alanine...
^
Glycine → Arginine
Whether this change matters depends on:
SIFT and PolyPhen-2 integrate multiple sources of information to predict whether an amino acid substitution will be tolerated or damaging.
Core principle: SIFT uses evolutionary conservation of amino acids across species. If an amino acid position has remained the same across many species, changing it is probably harmful.
How it works:
Build alignment: Collect protein sequences from many species (orthologs—the same protein in different species)
Calculate position-specific profile: For each amino acid position, count which amino acids appear across species
Score the substitution: If the variant introduces an amino acid rarely or never seen at that position in other species, it’s probably damaging
Score interpretation:
Example: Position 456 in protein BRCA1
SIFT score = 0.01 (very low) → “Deleterious”
Why? Arginine never appears at this position in any species. The substitution is unprecedented in evolution, suggesting it will break function.
Strengths:
Limitations:
Core principle: PolyPhen-2 integrates multiple features—conservation, structural information, and physicochemical properties—using a machine learning model to predict variant impact.
How it works:
Conservation analysis: Like SIFT, checks if substitution is seen in other species
Structural analysis: Maps variant to 3D protein structure (if available) and checks:
Physicochemical properties: Considers:
Machine learning prediction: Combines all features using a naïve Bayes classifier trained on known disease-causing and neutral variants
Score interpretation:
Example: The same BRCA1 variant (Glycine → Arginine at position 456)
PolyPhen-2 score = 1.0 (“Probably damaging”)
Why?
Strengths:
Limitations:
In practice, researchers often use both tools and look for agreement:
| SIFT | PolyPhen-2 | Interpretation |
|---|---|---|
| Deleterious | Probably damaging | High confidence: likely disruptive |
| Tolerated | Benign | High confidence: likely neutral |
| Deleterious | Benign | Conflicting: Needs further investigation |
| Tolerated | Probably damaging | Conflicting: Needs further investigation |
When tools disagree, it’s often because:
Conflicting predictions don’t mean the tools are broken—they mean the variant is in a gray zone where evolutionary and structural evidence don’t align neatly.
Let’s walk through a real diagnostic scenario:
Patient: 8-year-old with developmental delays and heart defects
Variant found: TBX5 gene, c.734G>A, p.Gly245Glu
Conservation scores:
Protein prediction:
Interpretation:
Conclusion: This variant is very likely the cause of the patient’s condition. It should be classified as “likely pathogenic” and reported to the clinician.
This is exactly the type of prioritization that makes clinical genomics possible—conservation-based tools help us go from thousands of candidates to a small number of high-confidence answers.
Now let’s put everything together: how do conservation-based tools fit into a real diagnostic workflow?
Starting point: Whole-genome or whole-exome sequencing from a patient with an undiagnosed condition
Raw data: 4-5 million variants per genome
Goal: Identify the 1-5 variants causing the patient’s disorder
Problem: Can’t test all variants experimentally
Solution: Multi-step filtering pipeline using conservation and other criteria
Here’s a typical variant prioritization workflow used in clinical genetics labs:
Remove low-quality variants
Remove common variants
Focus on potentially functional variants
Apply conservation scores
Prioritize genes relevant to patient’s phenotype
Consider family structure
Check each candidate
Studies of diagnostic yield in clinical exome sequencing show:
Starting variants: 20,000-30,000 (in exonic regions)
After frequency filtering: 500-1,000
After conservation filtering: 100-200
After phenotype matching: 10-30
After manual review: 2-5 reported to clinician
Diagnostic rate: ~25-50% of cases receive a molecular diagnosis
This means even with perfect filtering, about half of patients don’t get answers—either because:
Conservation-based tools are powerful, but they’re not perfect. It’s critical to understand when they work well and when they fail.
Highly conserved proteins across all life
Coding variants in well-studied genes
Ancient, essential functions
Some genes evolve quickly by necessity:
Immune system genes: MHC genes, immunoglobulins, T-cell receptors
Reproductive genes: Sperm surface proteins, egg coat proteins
Example: PRDM9 (recombination hotspot determination)
Young genes: Genes that arose recently in evolution have few or no orthologs in distant species
Example: ARHGAP11B
Enhancers and promoters:
Example: Lactase persistence enhancer
Sometimes a variant at one position is tolerated ONLY if there’s a specific change at another position.
Example: tRNA structure
False positives (predicted damaging but actually neutral):
False negatives (predicted benign but actually damaging):
Practical implication: Never rely solely on conservation scores. Always integrate with:
Conservation scores assess individual positions, but what if we zoom out and ask: Which genes tolerate variation, and which don’t?
This question has led to gene-level constraint metrics—scores that summarize how much variation a gene tolerates overall.
Constrained genes: Accumulate very few loss-of-function variants in the population
Unconstrained genes: Accumulate loss-of-function variants freely
The key insight: Genes under strong constraint are more likely to cause disorders when mutated.
Developed by: Exome Aggregation Consortium (ExAC) → Genome Aggregation Database (gnomAD)
What it measures: The probability that a gene is intolerant to loss-of-function (LoF) variants
How it works:
If observed << expected: Gene is intolerant → High constraint → pLI close to 1
If observed ≈ expected: Gene is tolerant → Low constraint → pLI close to 0
Interpretation:
Example:
MECP2 (Rett syndrome gene): pLI = 1.0
OR4F5 (olfactory receptor): pLI = 0.0
Clinical application: If a patient has a de novo LoF variant in a gene with pLI = 1.0, that variant deserves high priority. But pLI alone does not establish causality; interpretation still requires phenotype match, gene-disease validity, inheritance pattern, population frequency, and other ACMG-style evidence.
What it measures: Similar to pLI but provides a continuous score rather than a probability
How it works:
Interpretation:
Advantage over pLI: LOEUF provides a continuous spectrum and is more robust for genes with small numbers of variants
Example:
Let’s walk through a complete example to see how conservation-based tools work in practice.
Patient: 3-year-old girl
Symptoms:
Family history: Parents unaffected, no family history of neurological disorders
Initial diagnosis: Unknown
Test ordered: Trio exome sequencing (patient + both parents)
Initial variants called: 22,458 exonic variants
Step 1 - Frequency filtering (MAF < 0.1%): Remaining: 1,247 rare variants
Step 2 - Inheritance filtering (de novo or recessive):
Type: Missense, de novo
Conservation:
Protein predictions:
Gene constraint:
Gene function: Potassium channel, critical for neuronal excitability
Known gene-disease association: KCNQ2 mutations cause epileptic encephalopathy (matches patient phenotype!)
ClinVar: Similar variants reported as “pathogenic”
Assessment: STRONG CANDIDATE - This looks like the causative variant
Type: Missense, de novo
Conservation:
Protein predictions:
Gene function: Giant muscle protein
Assessment: LOW PRIORITY - Likely benign variant
Type: Missense, de novo
Conservation:
Protein predictions:
Gene function: Sodium channel, critical for action potential generation
Known gene-disease association: SCN2A mutations cause epilepsy and developmental delay (excellent match!)
Assessment: STRONG CANDIDATE - Also looks causative
Two strong candidates: KCNQ2 and SCN2A. Both fit the phenotype. Both have high conservation. Both are predicted damaging.
Next steps:
Final Diagnosis: KCNQ2-related epileptic encephalopathy
Lessons from This Case:
The power of evolutionary information:
DNA-level conservation scores:
Protein-level predictions:
Gene-level constraint:
Limitations to remember:
| Term | Definition |
|---|---|
| Conservation score | A numerical measure of how much a DNA position or amino acid has been preserved across species through evolution |
| PhyloP | Conservation score measuring whether a position evolves slower (positive score) or faster (negative score) than expected by chance |
| GERP++ (Genomic Evolutionary Rate Profiling) | Score representing the number of mutations rejected by natural selection at a given position |
| PhastCons | Score (0-1) representing the probability that a position belongs to a conserved element |
| SIFT (Sorting Intolerant From Tolerant) | Tool predicting whether an amino acid substitution is tolerated (score ≥ 0.05) or deleterious (score < 0.05) based on evolutionary conservation |
| PolyPhen-2 (Polymorphism Phenotyping v2) | Machine learning tool integrating conservation, structure, and physicochemical properties to predict variant impact |
| Missense variant | A genetic variant that changes one amino acid to another in a protein sequence |
| Multiple sequence alignment (MSA) | Arrangement of DNA or protein sequences from multiple species to identify conserved and variable positions |
| Ortholog | The same gene in different species (e.g., human insulin vs. mouse insulin) |
| pLI (Probability of Loss-of-Function Intolerance) | Score (0-1) indicating how intolerant a gene is to loss-of-function variants; scores > 0.9 suggest haploinsufficiency |
| LOEUF (Loss-of-Function Observed/Expected Upper Bound) | Ratio of observed to expected loss-of-function variants in a gene; scores < 0.35 indicate strong constraint |
| Constraint | The degree to which a gene or genomic region is intolerant to variation due to selection against deleterious mutations |
| Haploinsufficiency | Condition where one functional copy of a gene is not enough for normal function |
Explain why a highly conserved DNA position is more likely to be functionally important than a non-conserved position. What evolutionary forces create this pattern?
Compare and contrast PhyloP, GERP++, and PhastCons. When might each score be most useful?
A variant has SIFT score of 0.01 (deleterious) but PolyPhen-2 score of 0.2 (benign). What might explain this discrepancy? How would you investigate further?
Why do immune system genes and reproductive genes often have low conservation scores? Does this mean variants in these genes are harmless?
Explain the difference between pLI and LOEUF. Why might a gene have pLI = 0.98 but still occasionally have loss-of-function variants in the population?
A patient has a de novo missense variant in a gene with pLI = 0.02 and LOEUF = 1.8. How would you interpret this finding? Would you prioritize this as a causative variant?
Why might conservation-based tools fail to identify pathogenic variants in human-specific genes or recently evolved genes?
Describe a scenario where a variant at a non-conserved position could still be pathogenic. How would you identify such variants?