The Alliance for National High Performance Computing (NHR Alliance) was founded two years ago. During its development phase, it was affiliated with the DFN Association (German National Research and Education Network). Prof. Dr. Thomas Kühne, Founding Director of the Center for Advanced Systems Understanding (CASUS) and speaker of the NHR Nutzungsausschuss, explains in an interview recently published in the DFN bulletin what progress has been made so far and describes the current challenges in high-performance computing.
⬞
The Alliance for National High Performance Computing was launched just two years ago. Where does it stand today?
The NHR Alliance was initiated with the aim of providing all researchers throughout Germany with uniform, simple, and fair access to valuable computing resources. A lot of ground has been covered since its foundation.
It all started with a competition. Various data centers applied to join the NHR Alliance, and naturally, each institution tried to distinguish itself best. Once the selection of data centers had been completed, an opposing process began, namely growing together as an association and establishing a culture of cooperation – for example, in the area of training young scientists or tackling large infrastructure projects. And that has worked out very nicely so far.
One example of a successful collaboration is the Atomistic Simulation Center, in which three NHR centers – the Paderborn Center for Parallel Computing, the NHR-FAU in Erlangen, and the Zuse Institute Berlin – have joined forces to pool their resources and cover atomistic simulations for the application areas physics, chemistry and life sciences.
How is access to computing resources provided within the network?
We have jointly established NHR-wide allocation guidelines with uniform quality standards. Initially, each of the nine NHR centers had its own very different application and allocation procedure – due to the specific subject and application areas – that users and reviewers had to deal with. A milestone for HPC compute time application is, therefore, our electronic allocation portal JARDS – Joint Application Review and Dispatch Service, which we recently implemented at the NHR Alliance. It not only enables researchers to submit applications easily and centrally, but also guarantees both them and the reviewers a transparent insight into the allocation process. All applications and their assessment are subject to a science-led peer review process.
What are the tasks of the NHR Nutzungsausschuss in this process?
The Nutzungsausschuss designs the rules of procedure for the application selection and monitors their implementation according to uniform quality standards. Reviewers can now compare the history of old and newly submitted applications in JARDS and thus assess their novelty level, for example.
So I can apply with JARDS today – and when exactly can I expect to start working?
One of the demands of the Strategy Committee, which was set up in 2019 by the Joint Science Conference GWK as an autonomous and independent body, was indeed to massively shorten the cycle length of the application process. We have currently established quarterly application periods, which is a great success; it doesn’t get much faster than that.
From the review process to the start of compute time, it is already a 24/7 operation for the data centers. Preparations begin as soon as approval is granted – stress is a constant. However, the central allocation system gives us the opportunity to spread the load. Moreover, the pool of assessors is now larger. Every person applying to the NHR Alliance for compute time is also responsible for writing reviews – it is a give and take. But nevertheless, after the application is before the application, so to speak.
What else does the Nutzungsausschuss cover?
One overarching task is to monitor the areas of application in the NHR, as there is a strong tendency towards profiling. In consultation with all centers, we ensure that sufficient resources are available for all HPC-relevant application areas. This also includes the necessary hardware and, above all, specialized computing architectures. If necessary, the Nutzungsausschuss intervenes to regulate.
Another important criterion is the proportionality of the resources used. If one application is massively more expensive than another but just as good, it may well be that its approval comes at the expense of other applications. In extreme cases, other applications may, therefore, not be granted at all. In the review process, those factors must be weighted and balanced. For the time being, those reviews are conducted by highly experienced scientists.
Another task is balancing the computing load. If we experience acute bottlenecks or a data center is overloaded, we have the option of reallocating to another NHR data center as it serves the interests of the global NHR system. So far, this has yet to happen because we procured very powerful new high-performance computers right from the start. However, when data centers reach the end of their service life – usually after around five years – we will have to consider reallocations in the future.
Bigger, faster, better: You once described today’s supercomputers as time machines.
Just as astronomy can look billions of light years into the past with ever better telescopes, computational science can de facto look into the future with ever larger computing resources. Standard desktop computers take many years to run complex simulations. Using supercomputers is a decisive step forward, especially in a highly competitive environment such as science.
However, the entire infrastructure for supercomputers is highly customized and, to put it mildly, not exactly economical in terms of energy consumption. Quite a few people are wondering about the added value this provides and whether we could simply wait another 20 years. The answer is yes, we could. But we do not want to and cannot wait 20 years to solve urgent problems such as climate change or sustainable energy supply. Computer simulations and modeling are assuming an increasingly important role today.
You have just touched on the subject of energy consumption: How important is green IT in HPC?
On the one hand, there is the exciting question of resources and, on the other, very basic monetary aspects. Green IT is a huge issue for the simple reason that high-performance computers consume a lot of electricity and therefore produce large expenses. At the NHR, in particular, we notice that modernizing data centers consistently involves using innovative cooling concepts that can reduce power consumption. Christian Plessl (editor’s note: Paderborn University) and I received the Paderborn University Research Award in 2019 for our project “Green IT: Exact computations with inexact but energy-efficient computers”. The idea behind it was to perform lower-precision computing in an energy-efficient way – or rather, to work with approximations – and then compensate for the inaccuracies with innovative, fault-tolerant algorithms.
Another aspect of Green IT is sustainability. The success of the NHR Alliance has made access to computing resources very easy. When resources were still extremely limited, users had to be very careful about using them wisely. Today, we need to sensitize users so that resources are not used wastefully for nonsensical or unnecessary simulations that do not provide any added value. I am not so sure the community likes to hear that. Young researchers in particular, who are only familiar with straightforward NHR access, should also get a feel for computing costs. That is why some HPC centers started disclosing the power consumption cost, which usually creates quite a wow-effect in terms of CO2 emissions because it could very well be the equivalent of several transatlantic flights.
The NHR Alliance brings together HPC data centers in the Tier 2 performance class. You have already worked at Tier 1 data centers. Is the separation of the two tiers appropriate and contemporary?
If the question implies whether Tier 1 systems make sense, I would definitely say yes. The maximum computing power is also quite simply a question of power consumption. At the end of the day, this is the ultimate limit. Consequently, there is no way around accelerator architectures. In current flagship computers, this usually pertains to GPUs. There are only a few options for achieving maximum computing power and exploring the limits of high-performance computing. New methodologies are needed, and that is an important issue for the future.
For example, we are testing whether new algorithms can scale on accelerator architectures. We are currently doing this in our Tier 1 data centers. But if we only had these, it would be very difficult to also cover the majority of application areas. Truth be told, only a few applications absolutely have to run on Tier 1 systems. Good examples are quantum chromodynamics on the lattice for calculations in particle and nuclear physics or atomistic simulations needed in materials science or theoretical chemistry.
In absolute terms, they are the largest consumers of supercomputer resources. However, both also have a high demand for Tier 2 resources – so they are in the overlap zone. And this area has definitely been growing. Nowadays, Tier 2 data centers take on the majority of all computing jobs and can respond much more flexibly to the requirements of different application areas – in terms of training people, but also in terms of adapting computing resources.
You are even using quantum computers for your research. How are they utilized in HPC?
That is a paradigm shift to an entirely new type of HPC with its applications not yet widespread. In my field of research, quantum mechanics, we are dealing with calculating expected values. To do so, we use supercomputers with a large number of parallel computing cores or quantum computers. However, only part of the entire simulation, namely the function evaluation, takes place on the quantum computer. The variational optimization runs on a conventional HPC system. Both processes alternate. Therefore, the most successful algorithms in quantum computing are hybrid algorithms. This naturally raises the question of how we can combine an HPC system with a possible quantum computer. The obvious solution is to set up this technology at an HPC center because the corresponding high-performance computers for the classical part of these hybrid algorithms are already in place.
Do you see possible applications for quantum computers in the medium term?
It is a very exciting future technology that should definitely be used in some form. But in order to be prepared for it tomorrow, scientists need to be trained for it today. That is why we need access to quantum computers today. However, gaining access is far more difficult than with conventional HPC systems. The infrastructure required and the associated investments are considerable. You do not just go out and buy a quantum computer. Most architectures operate at extremely low temperatures requiring tremendous cooling efforts.
This undoubtedly calls for cooperative research. We are, therefore, moving more and more towards subscription models in which quantum computers are operated by the manufacturer and billed according to compute time. However, this entails a potentially dangerous development where regular university research groups simply cannot afford to do this on their own budget or with existing funding instruments. Forgoing the experimental testing of quantum computers would, in turn, be a major competitive disadvantage for Germany. That is why we need to find a different approach to quantum computers. I believe that establishing them at an HPC center is at least worth considering. But in the foreseeable future, quantum computers will probably not be used on a large scale in HPC centers. That is just my personal opinion. But what we are already extensively working with – especially in the field of atomistic simulations – is artificial intelligence AI.
What role does AI play in HPC?
This highly topical issue keeps us very busy in the Nutzungsausschuss because it entails many changes. The proportion of AI components in HPC applications has risen rapidly, and no end is in sight. In our current procurements, we are responding to the growing proportion of AI with the necessary hardware, mainly specialized accelerator architectures.
Can you provide an application example?
Until a few years ago, we carried out highly complex quantum mechanical simulations on HPC architectures. Today, such simulation data – a large number of atomistic configurations stored in databases – are used to train so-called surrogate models. These models are capable of predicting quantum mechanical solutions quite accurately. They ultimately replace lengthy, complicated simulation processes and lead to entirely new types of simulations. Actually, these surrogate models do require a lot of HPC computing resources as well and are, therefore, an NHR topic. Still, in terms of scalability, they provide veritable leaps through the time or size scale, which would not be possible with a direct simulation.
What are the current challenges in high-performance computing?
One aspect relates to big data: Computing resources are increasing exponentially, and in HPC simulations, we generate vast amounts of data, which in turn are analyzed and used for further research. Certain simulations are carried out across several data centers with the corresponding hardware resources located at different sites. We, therefore, depend on a fast network such as the X-WiN, which connects our NHR data centers.
But that is only part of the solution: We must ensure that the amount of transferred data is kept as small as possible. We achieve this by processing the calculated datasets on the spot. Ideally, we analyze the data during the simulation or immediately afterward – in other words, we already receive pre-processed data. This lowers the transfer volume significantly. Of course, it would be better if we didn’t have to transfer any data at all but could archive it locally for the long term. However, long-term archiving is not yet a classic NHR topic – on the data side, we are working with the National Research Data Infrastructure NFDI.
What has the NHR Alliance achieved so far? Does Germany now have a better international standing?
By all means! But I am not talking about the media-friendly top 500 lists for high-performance computing. If you use them as a reference, the Tier 1 computers are right at the top. But if you look at the aggregated performance of HPC systems in Germany, the large NHR centers are very prominent. The majority of research on important questions in the fields of catalyst development, sustainable systems, or energy materials, for example, is increasingly being carried out on Tier 2 systems. With our application system, you can gain access to excellent computing resources within three months and conduct your research at the highest international level right here in Germany. This is precisely what makes us competitive, and the NHR Alliance has flung that door wide open.
The interview was conducted by Maimona Id (DFN-Verein).
About Thomas D. Kühne
Current positions// Prof. Dr. Thomas Kühne is Professor of Computational Systems Science at the Dresden University of Technology (TUD) and since 2023 the Founding Director of CASUS – the Center for Advanced Systems Understanding in Görlitz/Saxony. He is the Vice Chairman of the Paderborn Center for Parallel Computing (PC2) and the recently founded Center for Sustainable Systems Design (CSSD). Furthermore, he is the Chairman of the NHR Atomistic Simulation Center and the NHR Center for Computational Physics and a member of the DFG Review Board. The computer scientists is a co-author of the open-source simulation package CP2K. His main research interests are the development of novel numerical methods and algorithms for chemical and physical processes and their implementation in the form of computer programs.
CV// In September 2018, Thomas D. Kühne took over the Chair for Theoretical Chemistry at the University of Paderborn, where he had held the professorship for Theoretical Interface Chemistry since 2014. From 2010 until he moved to Paderborn, Thomas D. Kühne was a junior professor for Theoretical Chemistry at the Johannes Gutenberg University in Mainz. After studying computer science, computational sciences and completing his doctorate in theoretical physics at the ETH Zurich in 2008, Kühne worked for a year as a postdoc at Harvard University. His research to date has focused primarily on the investigation of complex systems in condensed phases using computational methods, in particular aqueous systems such as water interfaces or biologically relevant reactions in water solutions. Kühne has published more than 150 publications in professional journals and holds a Starting Grant from the European Research Council.