Roughly one-tenth of the human genome remained uncharted when genomics researchers Karen Miga at the University of California, Santa Cruz, and Adam Phillippy at the National Human Genome Research Institute in Bethesda, Maryland, launched the Telomere-to-Telomere (T2T) consortium in 2019. Now, that number has dropped to zero. In a preprint published in May last year, the consortium reported the first end-to-end sequence of the human genome, adding nearly 200 million new base pairs to the widely used human consensus genome sequence known as GRCh38, and writing the final chapter of the Human Genome Project1.
First released in 2013, GRCh38 has been a valuable tool — a scaffold on which to map sequencing reads. But it’s riddled with holes. This is largely because the widely used sequencing technology developed by Illumina, in San Diego, California, produces reads that are accurate, but short. They are not long enough to unambiguously map highly repetitive genomic sequences, including the telomeres that cap chromosome ends and the centromeres that coordinate the partitioning of newly replicated DNA during cell division.
Long-read sequencing technologies proved to be the game-changer. Developed by Pacific Biosciences in Menlo Park, California, and Oxford Nanopore Technologies (ONT) in Oxford, UK, these technologies can sequence tens or even hundreds of thousands of bases in a single read, but — at least at the outset — not without errors. By the time the T2T team reconstructed2,3 their first individual chromosomes — X and 8 — in 2020, however, Pacific Biosciences’ sequencing had advanced to the extent that T2T scientists could detect tiny variations in long stretches of repeated sequences. These subtle ‘fingerprints’ made long repetitive chromosome segments tractable, and the rest of the genome quickly fell into line. The ONT platform also captures many modifications to DNA that modulate gene expression, and T2T was able to map these ‘epigenetic tags’ genome-wide as well4.
This diploid assembly work is being conducted in collaboration with T2T’s partner organization, the Human Pangenome Reference Consortium, which aspires to produce a more representative genome map, based on hundreds of donors from around the world. “We’re aiming to capture an average of 97% of human allelic diversity,” says Erich Jarvis, one of the consortium’s lead investigators and a geneticist at the Rockefeller University in New York City. As chair of the Vertebrate Genomes Project, Jarvis also hopes to leverage these complete genome assembly capabilities to generate full sequences for every vertebrate species on Earth. “I think within the next 10 years, we’re going to be doing telomere-to-telomere genomes routinely,” he says.
Protein structure solutions
Structure dictates function. But it can be hard to measure. Major experimental and computational advances in the past two years have given researchers complementary tools for determining protein structures with unprecedented speed and resolution.