Materials Learning Algorithms

CASUS Institute Seminar

Using the DNA language model GROVER to parse sequence and epigenetic effects on genome stability

Dr. Pierre M. Joubert, Postdoctoral Researcher in the Biomedical Genomics Group at Dresden University of Technology as part of an HZDR/CASUS Open Project

Abstract of the talk// Organisms must protect their genomes against instability to ensure faithful DNA replication. Both the epigenome and DNA sequence affect instability but their relative contributions remain unclear. To disentangle these contributions, Pierre and his colleagues combined the DNA language model GROVER with a panel of epigenomic features and an interpretable machine learning approach.

They show that GROVER can be used to model double strand break (DSB) sensitivity using the DNA sequence alone. A model trained on epigenomic data outperforms this model, however, showing that the epigenome encodes cell-type specific information that is useful for modeling DSB. Combining these two models results in the best performance, indicating that the sequence and the epigenome contain complementary information for modeling genome instability. By analyzing this combined model, the scientists identified histone marks that provide non-redundant information that cannot be learned from the sequence and may shape the cell-type specificity of DSB sensitivity.

Integrating these features directly within the GROVER architecture results in a model that performs similarly to the full-epigenome model and may increase the model’s ability to interpret the genome in a cell-type specific manner. The work demonstrates that DNA language models, combined with interpretable machine learning techniques, can be used to deepen our understanding of how instability is shaped by the genome’s sequence.

CV// Pierre M. Joubert is a postdoctoral researcher at the Center for Molecular and Cellular Bioengineering (CMCB) at TU Dresden, working within the Biomedical Genomics Group led by Dr. Anna Poetsch. He is employed through the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) and the Center for Advanced Systems Understanding (CASUS). Pierre studied at the University of Washington in Seattle (USA) and got his PhD degree from the University of California Berkeley (USA).

Pierre will be talking live in Görlitz. Interested scientists from Görlitz and beyond are kindly invited to join the live talk. However, as the event is organized in a hybrid format that includes a videoconferencing tool by Zoom Inc., people not present in Görlitz and interested in the topic have the chance to also join. Please ask for the login details via contact@casus.science.

venue

date

CASUS – Center for Advanced Systems Understanding, Conrad-Schiedt-Str. 20, D-02826 Görlitz, Deutschland

9 July 2025, 2:00 pm