Generative Artificial Intelligence Designs DNA Sequences to Turn Genes On and Off

Introduction

Nestled within our genomes are tiny sequences with immense power to control nearby genes. Known as cis-regulatory elements (CREs), these DNA sequences can turn adjacent genes on or off. Recently, researchers at Yale School of Medicine, the Jackson Laboratory, and the Broad Institute of MIT and Harvard University developed a new method for generative artificial intelligence to design novel regulatory elements that precisely control how genes are expressed in cells.

The Importance of Cis-Regulatory Elements

CREs play a crucial role in regulating gene expression. By functioning as molecular switches, they determine whether a gene is activated or silenced in a given cell type. This specificity is essential for the proper functioning of organisms, ensuring that genes are expressed only where and when they are needed. Understanding and manipulating these elements has significant implications for biology and medicine, especially for targeted gene therapies.

CODA Platform Development

The new artificial intelligence platform, called Computational Optimization of DNA Activity (CODA), uses deep learning to generate novel DNA sequences that function as synthetic CREs. Similar to well-known tools like DALL-E and ChatGPT, CODA is trained on large datasets of natural regulatory elements, allowing it to create sequences that are effective at turning genes on or off in specific cell types. “This project essentially asks, ‘Can we learn to read and write the code for these regulatory elements?’” explains Steven Reilly, PhD, assistant professor of genetics at YSM and one of the study’s lead authors.

Potential Applications in Gene Therapy

Controlling how genes are expressed in certain cell types could one day significantly improve gene therapies. These therapies have the potential to override disease-causing mutations, but more effective methods are needed to deliver treatments directly to affected cells. For example, targeting specific neurons that fail in Parkinson's disease or immune cells harboring HIV. The CODA platform could help target gene therapies to diseased cells more precisely, avoiding side effects in healthy parts of the body.

Promising Results and Future Directions

The researchers tested the AI-designed regulatory elements in lab-grown blood, liver, and brain cells and found that, in many cases, the synthetic elements were more cell-type specific than any known natural sequences. Subsequent tests in live zebrafish and mice showed that these sequences also worked to activate test genes in specific cell types in the animals. In one case, an engineered regulatory element activated a reporter gene only in a very specific layer of cells in the mouse brain, despite having been delivered throughout the animal's body.

Conclusion

The ability to design DNA sequences that control gene expression with high precision opens new frontiers in biomedical research. The CODA platform represents a significant advance, combining artificial intelligence and molecular biology to create tools that modulate gene expression in unprecedented ways. “Evolution may never have intended to build a large driver for an Alzheimer's drug, but that doesn't mean it can't exist,” says Reilly. With future studies, the researchers plan to expand the use of CODA to develop targeted gene therapies for a variety of genetic diseases, potentially overcoming the limitations imposed by natural evolution.

Share