Valley Players stage “A Walk in the Woods” at Amherst church, masslive.com/living
Better Health D |
TRAVEL: A guide to visiting the National Parks, D5 THEATER: Play marks journey of clergy abuse survivor, D9 HEALTH: Nurse practitioners take on more geriatrician roles, D9
| SUNDAY, JUNE 22, 2025
Scientists trained AI to predict gene activity,
A Potentially Powerful Tool By Mark Johnson
2024 Nobel Prize in chemistry and has now been updated to AlphaFold3. Scientists led by a team at The regulation of genes and Columbia University have the structure of proteins both trained a model to predict are fundamental to life, and how the genes inside a cell problems in either can trigger will drive its behavior, which could be a powerful tool with disease. “Biology is being transthe potential to broaden our understanding of cancer and formed into something that genetic diseases, and even to is a predictive science,” said Raul Rabadan, one of the pave the way for cell-specific authors of a paper reporting gene therapies to treat them. The researchers trained the the work in the journal Nature new artificial intelligence tool, and director of the Program an algorithm dubbed General Expression Transformer, or GET, using an approach similar to that used by creators of the language program ChatGPT. While ChatGPT learned the grammar of language, GET has learned underlying rules governing genes: how they are turned on or off like a light switch, or dialed up or down for Mathematical Genomics like a volume control. This at Columbia. “We’re seeing a complex process, known as revolution in biology.” gene expression, determines Mark Gerstein, a professor which proteins we make and whether we make them in the of biomedical informatics correct amounts, crucial work at Yale School of Medicine, given that proteins play a role who was not involved in the in virtually every action in the new study, said that for 15 to body - fighting disease, mov- 20 years experts have been ing, breathing, even thinking. systematically trying to make Although GET is at a much predictions about gene reguearlier stage of development, lation, building on a trove of carefully made datasets. The it could play a similar role to AlphaFold2, the AI system data examined all genes in that predicts the three-dimen- specific types of human cells sional structure of proteins. for example, retinal cells or neurons measuring, among The transformative technology was recognized with the other things, gene expresThe Washington Post
sion and the binding of key proteins called transcription factors. “This is a field poised to have this type of advancement by AI,” Gerstein said. While other scientific groups have trained models using abnormal cells such as those found in different cancers, Xi Fu, a graduate student in Rabadan’s lab, decided to train GET using information from cells in normal human tissue. The training used data
of Health, said learning about one cell type and then making predictions about another is an especially daunting challenge. “In some ways, it’s like if I handed somebody a bunch of books in English, and said: ‘Okay, now this is in Russian. What does it say?’ And they say: ‘Ah. I understand grammar, syntax and words. I’m going to make predictions about this even though it is written in a different lan-
the human body contain the same complete set of DNA. “Yet each cell type - be it a neuron, a muscle cell, or a skin cell - expresses a unique set of genes,” Ma said. Humans have about 20,0000 genes, some of which may be turned on in a retinal cell, but off in a skin cell. While “much of this regulatory grammar remains poorly understood,” Ma said, “The GET model takes an important step toward decoding this
as in development of gene therapies to correct a mutation - an error in the genetic code that harms a specific kind of cell. Such therapies have to be precisely designed so that they fix the cells harmed by a disease without disrupting other cell types. “We can design gene therapies that deliver a gene that is only expressed in one cell type, and not in another,” Rabadan said. Being able to predict which genes are turned on, off, up or down in different cells could help determine the cell of origin for a disease. A model that makes accurate predictions about gene regulation also holds the promise of lessening one of the more grueling tasks in science: deciding which of a massive number of possible experiments are the ones most likely to answer the guage.’ I would be like, ‘Wow, language.” researcher’s question. is that possible?’” Understanding the language A cancer, for example, may of gene regulation holds the contain more than 1,000 The work described in the mutations in the genome that Nature paper “directly tackles potential for great benefit to human health, said Yang E. have developed after concepone of biology’s major chalLi, assistant professor in the tion. The effects of most of lenges: understanding how departments of neurosurgery these mutations are unknown, the same genome can drive and genetics at Washington Rabadan said. That leaves such diverse behaviors in different cell types,” said Jian University School of Medicine scientists with the enormous task of determining where to Ma, professor of computation- in St. Louis. “We want to learn the start. al biology and director of the “The number of potential Center for AI-Driven Biomed- grammar and prioritize the genetic combinations is more ical Research in the School of key players in different cell Computer Science at Carnegie types,” Li said, “because many than the number of atoms in human diseases are caused by the universe,” Rabadan said. Mellon University. Ma, who was not involved in the new a disruption of that grammar.” “What are the ones that are Scientists hope the model relevant?” study, commented by email. All 30 trillion or so cells in will help in other ways, such
Scientists hope the model will help in other ways, such as in development of gene therapies to correct a mutation - an error in the genetic code that harms a specific kind of cell. Such therapies have to be precisely designed so that they fix the cells harmed by a disease without disrupting other cell types. from more than 1.3 million cells, spanning 213 different types found in the human body. Rabadan’s team found that they could omit one cell type from the data for example, astrocytes, which are found in the central nervous system and the model could make accurate predictions about astrocytes based on what it had learned from all of the other cells. Mike Pazin, a program director at the National Human Genome Research Institute, part of the National Institutes