Atomistic simulation software CP2K enables AI models

New usage guide introduces the software suite to a broad audience

The CP2K open-source package is among the top three most widely used research software suites worldwide for simulating the behavior of atoms and molecules. Among other applications, CP2K plays an important role in generating data used to train artificial intelligence (AI)-based models that determine molecular energies and forces. Since its beginnings in 2002, the range of the package’s methods and functions has grown steadily – also thanks to contributions from the Center for Advanced Systems Understanding (CASUS) at Helmholtz-Zentrum Dresden-Rossendorf (HZDR). Together with colleagues from Germany, Switzerland, the UK and Canada, the CASUS team has now summarized the current status in an overview article. The paper, published in The Journal of Physical Chemistry B (DOI: https://doi.org/10.1021/acs.jpcb.5c05851), focuses on the practical application of CP2K and is directed at new users from theoretical chemistry, materials science and neighboring fields.

The CP2K package contains apps and algorithms for simulating the behavior of atoms and molecules based on “first principles”. This means that it is built exclusively on fundamental physical models and does not require additional data from e.g. experiments. It uses classical and quantum mechanical approaches to calculate both the static properties as well as dynamic behavior of individual atoms or molecules in gases or solutions as well as large crystal lattices or two-dimensional materials. “The aim of CP2K is to predict average properties, such as those that arise in statistical mechanics and thermodynamics, of any substance made of interacting electrons and nuclei,” says Dr. Frederick Stein, one of the paper’s main contributors and research scientist at CASUS. “A distinguishing feature of CP2K is that from a wide variety of different classical and quantum mechanical energy and force methods users can choose and combine the methods suitable for their needs.”

Additional information:

Dr. Andreas Knüpfer

Research Team Leader
Center for Advanced Systems Understanding (CASUS) at HZDR

Press contact:

Dr. Martin Laqua

Officer Communications, Press and Public Relations
Center for Advanced Systems Understanding (CASUS) at HZDR

Two layers of graphitic carbon nitride stacked on top of each other (carbon atoms in gray, nitrogen atoms in blue). Variations of this polymer are being investigated for their suitability as photocatalysts (ready to react hydrogen atoms in white). The properties of various two-dimensional materials are being calculated in Thomas Kühne’s team with the help of CP2K. Source: J. Pototschnig/CASUS

“We can see that interest in CP2K has grown tremendously in recent years,” says Prof. Thomas D. Kühne, Director of CASUS and leader of the “Theory of Complex Systems” research team. “If a scientific tool is in such high demand we see it as our responsibility to maintain and expand the suite.” Besides implementing new features, mostly having been requested by the research community, the CASUS CP2K team also provides user support, contributes to publications on its use, promotes the software on conferences and workshops and advises other developers in implementing new features.

In materials science, simulations are an indispensable tool for screening a large number of theoretically possible materials to identify a few promising candidates that can then be investigated experimentally. However, these types of simulations are computationally demanding. They require both high-performance computing hardware as well as software tailored to the hardware and the task to be computed. CP2K calculations have long accounted for a significant proportion of the total computing time in many large supercomputing centers. “The software is highly scalable allowing efficient calculations with tens of thousands of central processing units or thousands of graphics processing units simultaneously,” says co-author Dr. Johann Pototschnig, member of CASUS’ CP2K team. “It is also optimized with regard to the physical models to be simulated and their algorithms. For example, one can reduce compute time by choosing the most efficient methods from both the classical or quantum mechanical realm.”

Providing data needed to train AI models

CP2K is widely used to generate high-quality electronic-structure data for training AI models in atomistic and materials science. Such data are primarily generated computationally using dedicated software packages such as CP2K. Thus, it is a prerequisite for advanced AI methods in the field of theoretical chemistry and materials design.

Beyond that, the CP2K package includes various AI surrogate models: machine-learned approximations of computationally expensive mappings that CP2K would otherwise compute explicitly. Instead of solving the full quantum-mechanical problem at every step, those AI surrogate models produce approximate results at a fraction of the computational cost. Using AI surrogates inside CP2K thus extends accessible time and length scales. Only the combination of large-scale simulations and AI models allows to tackle the most complex computations.

The possibility to generate high-quality data for training AI models is certainly one reason for the increased interest in CP2K. This interest mainly stems from user groups previously not connected to CP2K. The new overview paper aims to make it easier for newcomers to get started with the complex topic of atomistic simulations in general and how they are done using CP2K. The publication introduces the underlying methods and covers all capabilities of CP2K. Unlike other papers that focus on individual theoretical or application aspects, it is aimed specifically at an overview including aspects for practical use. The effort has been initiated by the CASUS team who invited many other individuals and groups contributing to CP2K.

Dr. Andreas Knüpfer, leader of the CASUS Scientific Computing Core, says, “This overview paper on CP2K reflects our department’s mission to support cutting-edge research in various fields with our expertise in computational science. At the same time, it is important to make the results accessible and enable their transfer to other fields and application areas. Ultimately, this also benefits CP2K itself, because only with a broad community that is welcoming to new members it can maintain the leading role it currently holds.”

________________________________________________________

Publication

M. Iannuzzi et al.: The CP2K Program Package Made Simple, in The Journal of Physical Chemistry B, 2026 (DOI: 10.1021/acs.jpcb.5c05851)

________________________________________________________

About the Center for Advanced Systems Understanding

CASUS was founded 2019 in Görlitz/Germany and pursues data-intensive interdisciplinary systems research in such diverse disciplines as earth systems research, systems biology or materials research. The goal of CASUS is to create digital images of complex systems of unprecedented fidelity to reality with innovative methods from mathematics, theoretical systems research, simulations as well as data and computer science to give answers to urgent societal questions. The founding partners of CASUS are the Helmholtz-Zentrum Dresden-Rossendorf (HZDR), the Helmholtz Centre for Environmental Research in Leipzig (UFZ), the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden (MPI-CBG), the Technical University of Dresden (TUD) and the University of Wrocław (UWr). CASUS, managed as an institute of the HZDR, is funded by the German Federal Ministry of Research, Technology and Space (BMFTR) and the Saxon State Ministry for Science, Culture and Tourism (SMWK).

Additional information:

Dr. Andreas Knüpfer

Research Team Leader
Center for Advanced Systems Understanding (CASUS) at HZDR

Press contact:

Dr. Martin Laqua

Officer Communications, Press and Public Relations Center for Advanced Systems Understanding (CASUS) at HZDR