Human Genetics

Chapter 2. The Telomere-to-Telomere Genome Project

When the Human Genome Project (HGP) announced its completion in 2003, it was celebrated as a historic milestone. Yet the “complete” human genome wasn’t actually complete. About 8% of our genome remained unsequenced — not because scientists didn’t want to sequence it, but because the technology at the time simply couldn’t handle certain regions.

Early sequencing technologies read short fragments of DNA (around 100-200 base pairs), which worked well for most of the genome. But in highly repetitive regions — where the same sequence appears over and over — these short reads couldn’t distinguish one repeat from another, like trying to place dozens of identical puzzle pieces without knowing where each belongs.

The missing regions weren’t random leftovers. They included some of the most functionally important parts of our chromosomes: centromeres (the central structures that help separate chromosomes during cell division), telomeres (the protective caps at chromosome ends), and ribosomal DNA arrays (genes that make the machinery for protein production). These gaps limited our understanding of chromosome stability, cell division, and even how genetic diseases develop.

Figure 2-1: Why Short-Read Sequencing Failed in Repetitive Regions

Figure 2-1: Why Short-Read Sequencing Failed in Repetitive Regions. Short DNA reads (100-200 bp) cannot resolve highly repetitive genomic regions because they cannot distinguish between nearly identical repeat units. Long reads (20 kb to >100 kb) can span multiple repeats in a single read, enabling unambiguous placement and assembly of these challenging regions.

In 2022, the Telomere-to-Telomere (T2T) Consortium changed this. They produced the first truly complete human genome sequence, called T2T-CHM13, covering all 3.055 billion base pairs from one end (telomere) to the other end (telomere) of every chromosome — hence the name. This achievement added about 200 million base pairs of new sequence and revealed 1,956 previously unknown gene predictions, 99 of which are predicted to be protein coding (Nurk et al. 2022, Science).

Figure 2-2: The Telomere-to-Telomere Assembly Concept

Figure 2-2: The Telomere-to-Telomere Assembly Concept. The name “telomere-to-telomere” refers to complete, continuous DNA sequences spanning from one chromosome end (telomere) through the centromere to the other telomere, with no gaps in repetitive regions. Previous reference genomes contained numerous gaps, particularly in centromeres and other repetitive regions, while T2T assemblies provide uninterrupted sequences.

What Made T2T-CHM13 Possible?

Two major advances enabled the T2T project to succeed where earlier efforts had failed: new sequencing technologies and a special cell line.

The Technologies

The T2T Consortium combined several cutting-edge technologies:

PacBio HiFi sequencing: This technology reads much longer DNA fragments — around 20,000 base pairs (20 kb) — with high accuracy. These longer reads can span multiple repeats, making it possible to distinguish one repeat from another.
Oxford Nanopore ultra-long-read sequencing: These reads can exceed 100,000 base pairs, allowing scientists to read through even longer repetitive regions in a single pass.
Supporting methods: Additional techniques like Illumina short-read sequencing (for error correction), Hi-C (which maps how DNA folds in 3D space), Bionano optical mapping (which creates a physical map of the genome), and Strand-seq (which tracks which DNA strands came from each parent) all helped ensure accuracy.

String Graphs: A Tool for Assembly

The assembly process used a high-resolution string graph — a powerful computational approach that helped resolve the most complex repetitive regions (see Figure 2-4).

Figure 2-4: CHM13 String Graph

Figure 2-4: CHM13 String Graph. This diagram shows how the T2T-CHM13 genome was assembled using a string graph approach. Each line represents a DNA sequence, and connections show where sequences overlap. The tangled regions reveal highly repetitive areas like the ribosomal DNA arrays and centromeric satellites. Source: Nurk, S. et al. (2021). The complete sequence of a human genome. bioRxiv. https://doi.org/10.1101/2021.05.26.445798. License: CC-BY 4.0.

Unlike traditional linear reference genomes that represent a single sequence, a string graph represents DNA as a network where nodes are DNA sequences and edges show how they connect. Like a subway map showing multiple routes between stations, the graph displays all possible connections, and long reads help determine the correct path through repetitive regions.

This approach was critical for handling repeats: when you have many identical DNA sequences (like in centromeres), a linear approach gets confused about which piece connects where. The graph shows all possibilities, and ultra-long reads spanning the repeats reveal the correct connections. The final T2T-CHM13 assembly achieved an exceptionally high accuracy with an error rate of just 1 mistake per 10 million bases.

While T2T-CHM13 is presented as a linear sequence (because it represents one individual), the future of genomics is moving toward pangenome references that better capture human genetic diversity across populations.

The Special Cell Line: CHM13

The T2T project used DNA from a unique cell line called CHM13, which came from a complete hydatidiform mole — a rare type of tissue that forms when an egg without genetic material is fertilized by a sperm that duplicates its own genome. This means CHM13 has two identical copies of each chromosome from the father and none from the mother.

In a typical diploid genome, the two sets of chromosomes inherited from each parent make assembly more complex because similar but not identical sequences must be distinguished. CHM13’s duplicated genome simplified this problem: matching sequences were known to be identical. However, CHM13 lacked a Y chromosome, which remained the final missing piece.

Figure 2-3: The CHM13 Advantage for Genome Assembly

Figure 2-3: The CHM13 Advantage for Genome Assembly. In typical diploid genomes, assembly must distinguish between two similar but not identical chromosome copies (haplotypes) from each parent, creating complexity in repetitive regions. CHM13, derived from a complete hydatidiform mole, contains two identical copies of each chromosome, effectively creating a haploid-like assembly problem that greatly simplifies the resolution of complex repetitive sequences.

The First Complete Human Chromosome: X (2020)

While the full T2T-CHM13 genome was announced in 2022, the project achieved its first major milestone in 2020 by completing the human X chromosome — the first chromosome to be assembled telomere-to-telomere with no gaps (Miga et al. 2020, Nature).

Why Start with X?

The research team chose the X chromosome as their first target for several strategic reasons:

High initial continuity: In the whole-genome assembly, the X chromosome was broken in only three places — the centromere and two large segmental duplications
Well-characterized centromere: The X centromere (DXZ1) had been extensively studied, providing validation benchmarks
Medical importance: The X chromosome is disproportionately involved in genetic diseases
Technical advantage: Using the CHM13 cell line (46,XX), they had two copies of X but no Y, making X more accessible

Figure 2-5: Initial CHM13 X Chromosome Assembly

Figure 2-5: Initial CHM13 X Chromosome Assembly. (See Figure 1b) The X chromosome was initially broken at three locations: the centromere (artificially collapsed in the assembly), a 120-kb segmental duplication (DMRTC1B), and a 134-kb segmental duplication with a paralogue on chromosome 2. Black bars indicate gaps in the GRCh38 reference, and red bars show known segmental duplications. Source: Miga, K.H. et al. (2020). Telomere-to-telomere assembly of a complete human X chromosome. Nature, 585, 79-84. https://doi.org/10.1038/s41586-020-2547-7. License: CC-BY 4.0.

Finishing the Breaks

The two segmental duplications were resolved by finding ultra-long reads that completely spanned the repeats and were uniquely anchored on both sides. This allowed confident placement in the assembly.

The centromere presented a greater challenge — it’s a 3.1 Mb array of highly repetitive alpha satellite DNA where standard polishing methods failed. Initial attempts to polish the centromeric assembly actually decreased quality because reads were incorrectly placed due to the extreme repetitiveness.

The Innovation: Marker-Assisted Polishing

The team developed a novel marker-assisted polishing strategy to finish the large repetitive regions:

1. Catalogue unique markers: They identified short (21 bp), unique sequences present only once in the genome. Remarkably, even within the DXZ1 centromeric array, there was enough variation between repeat copies to create unique markers at semi-regular intervals.

Genome-wide: average spacing = 66 bp between markers
DXZ1 array:  average spacing = 2.3 kb between markers
Longest gap in DXZ1: 42 kb

2. Anchor reads precisely: These markers guided the correct placement of long reads during polishing, preventing the quality degradation that occurred with standard methods.

3. Iterative polishing: Multiple rounds of polishing were performed with Nanopore, then PacBio, then Illumina data, each improving accuracy.

Figure 2-6: Marker-Assisted Polishing Improves Assembly Quality

Figure 2-6: Marker-Assisted Polishing Improves Assembly Quality. (See Figure 1d) Example from the GAGE locus showing coverage depth before (top) and after (bottom) marker-assisted polishing. Black dots indicate primary allele coverage, red dots show secondary alleles. The uniform coverage and elimination of secondary alleles after polishing demonstrates the dramatic quality improvement achieved by this method. Source: Miga, K.H. et al. (2020). Telomere-to-telomere assembly of a complete human X chromosome. Nature, 585, 79-84. https://doi.org/10.1038/s41586-020-2547-7. License: CC-BY 4.0.

Validating the X Centromere

The 3.1 Mb DXZ1 centromeric array consists of approximately 1,408 copies of a ~2,057-bp “higher-order repeat” (HOR) unit. This canonical repeat is made of 12 divergent alpha satellite monomers (each ~171 bp) arranged in a specific order. The team validated this structure through multiple independent methods:

Experimental validation:

Pulsed-field gel electrophoresis (PFGE): Confirmed array size of 2.87 Mb ± 0.16
Droplet digital PCR (ddPCR): Counted 1,408 ± 41 copies of the 2,057-bp repeat unit
Optical mapping: Confirmed the restriction enzyme patterns matched predictions

Sequence-based validation:

PacBio HiFi reads: Showed uniform coverage across the entire array
Ultra-long Nanopore reads: Many reads spanned >100 kb of the centromere
Completeness check: All HiFi reads matching DXZ1 were explained by the assembly

Figure 2-7: Complete Structure of the 3.1 Mb X Centromere

Figure 2-7: Complete Structure of the 3.1 Mb X Centromere. The DXZ1 array consists of approximately 1,408 copies of a ~2-kb higher-order repeat unit. (a) Predicted restriction map showing the array structure. (b) Experimental PFGE Southern blots matching the in silico prediction. (c) ddPCR copy number estimates across multiple cell lines. (d) Catalogue of 33 structural variants identified within the array. Source: Miga, K.H. et al. (2020). Telomere-to-telomere assembly of a complete human X chromosome. Nature, 585, 79-84. https://doi.org/10.1038/s41586-020-2547-7. License: CC-BY 4.0.

The assembly catalogued 33 different structural variants within the DXZ1 array, including:

1,475 canonical 12-mer units (98% of all units)
Various altered units (11-mers, 22-mers, 18-mers, etc.)
One 8.1-kb LINE/L1 insertion

This level of detail was impossible with previous technologies and provides insights into centromere evolution and function.

Achievements of the Complete X

The complete X chromosome assembly:

Closed 29 gaps from GRCh38, totaling 1,147,861 bp of previous N-bases
Achieved 99.991% accuracy (estimated from BAC sequences) or 99.995% (from Illumina data)
Met the original Bermuda Standards for finished genomic sequences (>99.99% accuracy, <1 error per 10,000 bases)
Added complete sequences of medically important gene families:
- GAGE genes: 19 complete + 2 partial copies (9.5 kb each)
- CT45 genes: 6 copies
- CT47 genes: 7 copies
Revealed the complete structure of the pseudoautosomal regions (PAR1 and PAR2)
Provided the first complete view of cancer-testis gene families that were previously fragmented

Methylation Discovery

The precisely anchored ultra-long reads enabled chromosome-wide methylation mapping at single-base resolution — even across complex repeats that were previously invisible to methylation studies. This revealed several surprising patterns:

Hypomethylation in PAR regions (consistent with their escape from X-inactivation)
A 93-kb hypomethylated region within the DXZ1 centromere (chrX: 59,217,708-59,279,205) — potentially marking the site where kinetochore proteins bind during cell division
Distinct methylation bands within the DXZ4 macrosatellite array — the repeat that partitions X chromosome topology into two large superdomains

Figure 2-8: Methylation Patterns Across the Complete X Chromosome

Figure 2-8: Methylation Patterns Across the Complete X Chromosome. Nanopore sequencing captured methylation patterns across the entire chromosome. (a) Hypomethylation in PAR1, with detailed view showing unmethylated bases (blue) and methylated bases (red). (b) A 93-kb hypomethylated region within the DXZ1 centromere. (c) The DXZ4 macrosatellite array showing a transition from methylated to unmethylated regions. Source: Miga, K.H. et al. (2020). Telomere-to-telomere assembly of a complete human X chromosome. Nature, 585, 79-84. https://doi.org/10.1038/s41586-020-2547-7. License: CC-BY 4.0.

The hypomethylated centromeric region was particularly intriguing. To test whether this was unique to the X or a general centromeric feature, the team manually assembled the centromere of chromosome 8 (D8Z2, ~2.02 Mb) and found a similar hypomethylated region, suggesting this may be a conserved feature marking functional centromeres.

Impact

This 2020 achievement demonstrated that completing an entire human chromosome was possible and established the methods that would be used to complete all other chromosomes:

Ultra-long-read sequencing can span even megabase-scale repeats
Marker-assisted polishing is essential for accurate assembly of large tandem repeats
Multiple orthogonal validation methods (PFGE, ddPCR, optical mapping, HiFi reads) ensure accuracy
Nanopore methylation calling works even in highly repetitive regions

The complete X chromosome became the foundation for finishing the rest of the human genome.

What’s New in T2T-CHM13? (2022)

Building on the success of the X chromosome, the T2T Consortium completed the remaining chromosomes to produce the full T2T-CHM13 genome. This assembly includes several types of regions that were missing or incomplete in the previous reference genome (GRCh38):

Figure 2-9: Genomic Regions Revealed by T2T Assemblies

Figure 2-9: Genomic Regions Revealed by T2T Assemblies. The T2T-CHM13 genome revealed four major types of previously missing or incomplete regions: (1) Complete centromeric satellite arrays essential for chromosome segregation, (2) Segmental duplications driving structural variation and evolution, (3) Full ribosomal DNA arrays on acrocentric chromosome short arms required for ribosome biogenesis, and (4) The previously unsequenced Y chromosome heterochromatic region (Yq12) containing over 30 Mb of satellite DNA.

Figure 2-10: T2T-CHM13 Assembly Ideogram

Figure 2-10: T2T-CHM13 Assembly Ideogram. (See Figure 1a) This ideogram shows what’s new in T2T-CHM13 compared to GRCh38. Red areas highlight newly added regions, including complete centromeres and the short arms of five acrocentric chromosomes (13, 14, 15, 21, and 22). The track at the top shows that CHM13 has primarily European genetic ancestry. Source: Nurk, S. et al. (2021). The complete sequence of a human genome. bioRxiv. https://doi.org/10.1101/2021.05.26.445798. License: CC-BY 4.0.

T2T-CHM13 completed several major types of previously missing regions. Each presented unique sequencing challenges and plays distinct roles in cellular function.

Region 1: Centromeric Satellite Arrays

What Are They?

The centromere is the pinched middle section of each chromosome, made of highly repetitive DNA called satellite arrays. In humans, the most common type is alpha satellite DNA, where a short sequence (about 171 base pairs) repeats hundreds or thousands of times in tandem.

Why Do We Need Them?

During cell division, chromosomes must be pulled apart and distributed equally to daughter cells. The centromere serves as the attachment site for kinetochores, protein complexes that connect to spindle fibers and enable chromosome segregation.

Centromere dysfunction leads to chromosome missegregation, resulting in aneuploidy — cells with abnormal chromosome numbers. Most aneuploid embryos are not viable, and aneuploidy in surviving cells is associated with cancer and genetic disorders including Down syndrome.

What Did T2T-CHM13 Reveal?

T2T-CHM13 fully sequenced all centromeric regions, showing their complete structure for the first time. Each centromere has unique features:

Sizes range from ~366 kb (chromosome Y) to several megabases
Different chromosomes use different alpha satellite variants
Some centromeres (like those on chromosomes 1, 5, and 19) share similar sequences, suggesting recent exchanges of DNA between them

Region 2: Segmental Duplications

What Are They?

Segmental duplications are large DNA sequences — typically thousands to millions of base pairs long — that appear in multiple genomic locations with 90-99% sequence identity.

Why Do We Need Them?

Segmental duplications drive genetic diversity and evolution through non-allelic homologous recombination. Their high sequence similarity enables mispairing during DNA replication or recombination, generating structural variants — duplications, deletions, or rearrangements.

Some structural variants confer adaptive advantages. Copy number variation in the AMY1 gene, which encodes salivary amylase, correlates with dietary starch intake across human populations.

However, structural variants also cause disease. Certain duplications are linked to facioscapulohumeral muscular dystrophy (FSHD), a muscle-weakening disorder, and many other genetic conditions.

What Did T2T-CHM13 Reveal?

T2T-CHM13 added substantial amounts of segmental duplication sequence and corrected many errors in GRCh38. The complete assembly revealed that about 6.61% of the genome consists of segmental duplications (201.93 Mb), compared to only 5.00% that could be properly identified in GRCh38. This includes many medically relevant genes that were previously incomplete or incorrectly assembled.

Region 3: Short Arms of Acrocentric Chromosomes

What Are They?

Chromosomes 13, 14, 15, 21, and 22 are acrocentric, with centromeres positioned near one end, creating a short p arm and a much longer q arm.

The short arms contain tandem arrays of ribosomal DNA (rDNA) — genes encoding ribosomal RNA (rRNA), an essential structural and catalytic component of ribosomes. Each rDNA unit spans approximately 45,000 base pairs and is repeated dozens to hundreds of times.

Why Do We Need Them?

Ribosomes are essential for protein synthesis. The rDNA arrays on acrocentric chromosomes cluster to form the nucleolus, the subnuclear compartment where ribosome biogenesis occurs.

rDNA copy number varies substantially between individuals (CHM13 contains approximately 400 copies). Whether this variation influences protein synthesis efficiency or phenotypes such as growth, aging, or cancer susceptibility remains under investigation.

What Did T2T-CHM13 Reveal?

T2T-CHM13 provided the first complete view of the short arms of all five acrocentric chromosomes, totaling 66.1 Mb of new sequence. These regions follow a similar structure: from telomere to centromere, they contain distal repeat arrays, the rDNA array, and proximal repeat arrays including various satellite sequences.

These short arms share approximately 98.7% sequence identity, consistent with frequent intergenic exchange. This homogenization likely reflects the physical clustering of acrocentric chromosomes in the nucleolus during interphase.

Completing the Y Chromosome (2023)

Although T2T-CHM13 was a major achievement, it had one limitation: CHM13 is a 46,XX cell line, meaning it has no Y chromosome. The Y chromosome had been notoriously difficult to sequence because of its complex repeat structure, including long palindromes (sequences that read the same forwards and backwards), tandem repeats, and segmental duplications. In fact, more than half of the Y chromosome was missing from GRCh38.

In 2023, the T2T Consortium completed this final piece of the puzzle by sequencing the Y chromosome from a different genome, HG002, which is commonly used for benchmarking (Rhie et al. 2023, Nature). The resulting assembly, called T2T-Y, is 62,460,029 base pairs long with no gaps — adding over 30 million base pairs of sequence compared to GRCh38-Y.

Figure 2-11: Complete Structure of the Human Y Chromosome

Figure 2-11: Complete Structure of the Human Y Chromosome. (See Figure 1a) This comprehensive view shows alignment of T2T-Y to GRCh38-Y, locations of protein-coding genes with clusters of ampliconic genes highlighted, organization of palindromic sequences and inverted repeats, and detailed structure of centromeric and satellite DNA regions. The bottom panel shows a dotplot revealing the highly repetitive nature of this chromosome. Source: Rhie, A. et al. (2022). The complete sequence of a human Y chromosome. bioRxiv. https://doi.org/10.1101/2022.12.01.518724. License: CC0 (US Government work).

What Makes the Y Chromosome Special?

The Y chromosome carries genes critical for male development and fertility:

1. SRY: The master gene that determines male sex

2. Ampliconic genes: These genes exist in multiple copies and are important for sperm production:

TSPY: The largest gene family on the Y, with 45 protein-coding copies in HG002 (compared to only 7 in GRCh38-Y)
DAZ: Four copies involved in sperm production
RBMY: 34 copies, also critical for spermatogenesis

3. AZF regions (Azoospermia Factors): Three regions (AZFa, AZFb, AZFc) where deletions can cause male infertility

The Y Centromere

Like other chromosomes, the Y has a centromere, but T2T-Y revealed unique features:

Figure 2-12: Structure of the T2T-Y Centromere

Figure 2-12: Structure of the T2T-Y Centromere. (See Figure 1b) The Y centromere spans 366 kb and consists of the DYZ3 alpha satellite array with three different variants, no transposable elements within the main array, two regions of hypomethylation where CENP-A proteins bind, and a dotplot showing that the repeat units are 99.5-100% identical, demonstrating extreme homogeneity. Source: Rhie, A. et al. (2022). The complete sequence of a human Y chromosome. bioRxiv. https://doi.org/10.1101/2022.12.01.518724. License: CC0 (US Government work).

The centromere spans 366 kb and consists of highly similar alpha satellite repeats. Interestingly, the T2T-Y centromere shows two distinct hypomethylated regions where kinetochore proteins bind — a pattern also seen in some other chromosomes.

The Heterochromatic Region (Yq12)

Perhaps the most mysterious part of the Y chromosome is Yq12, the large heterochromatic region on the long arm that was almost entirely missing from GRCh38 (represented as a single 30+ megabase gap). T2T-Y finally revealed what’s inside: over 30 million base pairs of alternating blocks of two satellite families:

DYZ1 (HSat3): Blocks ranging from 80 kb to 1,600 kb (median 370 kb)
DYZ2 (HSat1B): Blocks ranging from 20 kb to 1,200 kb (median 230 kb)

Figure 2-13: The Mysterious Yq12 Region Revealed

Figure 2-13: The Mysterious Yq12 Region Revealed. (See Figure 1d) Fluorescence microscopy images show the Y chromosome with different colored probes. A map showing 86 large blocks alternating between DYZ2 and DYZ1, with nearly all repeat units being over 98% identical to consensus. Sequence composition shows DYZ2 contains an ancient AluY fragment. A phylogenetic tree shows that AluY fragments in HSat1B cluster together, suggesting this satellite family originated on the Y. Source: Rhie, A. et al. (2022). The complete sequence of a human Y chromosome. bioRxiv. https://doi.org/10.1101/2022.12.01.518724. License: CC0 (US Government work).

These satellite blocks show evidence of recent duplication events — some duplications span up to 5 megabases and include multiple DYZ1 and DYZ2 blocks. Interestingly, HSat1B is almost unique to the Y chromosome and the short arms of acrocentric chromosomes, while HSat3 is found on many chromosomes.

Quick Reference: Complex Genomic Regions

Region	What It Is	Why It Matters	Why It Was Hard to Sequence	Size in T2T
Centromeric Satellite Arrays	Repetitive DNA (171-bp alpha satellite repeats) at the chromosome’s central “waist”	Acts as a handle for proteins to pull chromosomes apart during cell division; errors can cause aneuploidy, cancer, or developmental disorders	Short reads couldn’t distinguish between identical repeats; long reads (20-100+ kb) finally resolved them	Varies by chromosome; Y centromere is 366 kb; X centromere is 3.1 Mb
Segmental Duplications	Large DNA chunks (thousands to millions of bp) copied across the genome with 90-99% similarity	Drive genetic diversity through structural variants; can affect traits (e.g., starch digestion) or cause disease (e.g., muscular dystrophy)	Near-identical sequences confused short-read technology; long reads distinguished them and revealed true variation	6.61% of genome (201.93 Mb) vs 5.00% in GRCh38
Short Arms of Acrocentric Chromosomes	Repetitive rDNA arrays (45,000-bp units repeated dozens to hundreds of times) on chromosomes 13, 14, 15, 21, 22	Encode ribosomal RNA for ribosomes; essential for protein synthesis; copy number varies between people (~400 in CHM13)	Repetitive arrays were indistinguishable with short reads; long reads and specialized assembly algorithms mapped complete copies	Total 66.1 Mb across all five chromosomes
Y Chromosome Heterochromatin (Yq12)	Alternating blocks of HSat1B/DYZ2 and HSat3/DYZ1 satellite repeats	Unknown function, but shows recent structural rearrangements; highly variable between individuals	Extremely long tandem repeats spanning 30+ Mb were impossible to sequence with short reads	Over 30 Mb (was a single gap in GRCh38)

Summary: Why These Regions Matter

Before the T2T project, these regions — centromeric satellite arrays, segmental duplications, acrocentric short arms, and the Y chromosome — were often called genomic “dark matter” because they remained largely invisible to sequencing technologies. They left massive gaps in earlier reference genomes like GRCh37 (hg19) and GRCh38 (hg38).

By using long-read sequencing technologies and innovative computational methods like marker-assisted polishing, the T2T Consortium finally illuminated these regions:

T2T X chromosome (2020): First complete human chromosome, established key methods
T2T-CHM13 (2022): Completed 22 autosomes and chromosome X, adding about 200 million base pairs
T2T-Y (2023): Completed the Y chromosome, adding over 30 million base pairs

Together, these assemblies were combined to create T2T-CHM13v2.0 (also called T2T-CHM13+Y), providing the first truly complete sequence of a human genome — all 24 chromosomes from telomere to telomere with no gaps.

These advances improve our understanding of:

Chromosome function: How centromeres ensure accurate cell division
Genetic diversity: How segmental duplications and structural variants contribute to differences between individuals
Cellular machinery: How rDNA arrays support protein synthesis and how variation in these arrays might affect health
Male fertility: How ampliconic genes on the Y chromosome function and how rearrangements can cause infertility
Evolution: How satellite sequences spread between chromosomes and how the Y chromosome has changed over time
Epigenetics: How methylation patterns organize across centromeres and other complex regions

These insights are already improving variant calling (identifying genetic differences) in large-scale studies, supporting precision medicine by identifying disease-causing mutations, and helping us understand male infertility and other Y-linked conditions.

Looking Forward

The T2T assemblies represent major advances, but the work continues. Because these genomes represent specific individuals (CHM13 and HG002), they don’t capture the full spectrum of human genetic diversity. That’s where projects like the Human Pangenome Reference Consortium (HPRC) come in — by assembling genomes from hundreds of individuals from different populations, researchers are building references that reflect the true diversity of humanity.

The complete human genome isn’t an ending — it’s a new beginning. With these complete references, we can now:

Detect variants we previously missed: Many structural variants in centromeres and segmental duplications were invisible before
Study chromosome function: Complete centromeres allow us to understand how they work and why they sometimes fail
Improve clinical diagnostics: Complete gene sequences help identify disease-causing mutations more accurately
Understand human diversity: Comparing complete genomes from different populations reveals how humans have adapted to different environments

As sequencing technologies continue to improve and become more affordable, we’ll be able to study these complex regions in even more detail, uncovering new insights into human health, evolution, and what makes us human.