Google Summer of Code 2021

How to Get in Touch

Before You Apply

Feel free to contact us before you formally apply! This is a great way to get to know us better and receive answers to any questions you might have. There are two ways to contact us:

  1. You can contact the mentors of the idea you are interested in. Their e-mail addresses are included in the respective idea descriptions. Please include a [GSoC] prefix in your e-mail subject.
  2. You can join our Mattermost chat:
    To sign in you need one of the following:
  • An account at one of the educational or science institutions supported by Helmholtz AAI (using the Helmholtz AAI option)
  • A GitHub account (using the GitHub option)
  • An HZDR employee account (using the HZDR option).

I Have a Project Idea That Is Not Listed Here!

At CASUS we welcome contributions from the outside. If you believe your idea fits with our software projects (see below) or our research disciplines please contact one of our GSoC administrators:

Applying at CASUS

If you are interested in joining us for the duration of the GSoC program please apply per e-mail to the mentors of the idea you want to help implement. Please include the following information in your application:

  • Your name.
  • Your timezone. Most of our mentors live and work in Central Europe which means that timezones outside the range (UTC – 1) to (UTC + 3) require a bit more coordination.
  • If already known, your available time slots for video conferences / other methods of direct communication. We are currently planning for weekly 1-on-1 video conferences with our students. Additionally, you are invited to join our weekly regular developer meetings.
  • Your programming skills related to the idea.
  • Your available computing equipment: Tower PC / Laptop, available CPUs & GPUs, etc. This is not a selection criterion but information for us so we can arrange for remote access to HPC systems if required for the task.
  • Anything else you would want us to know!

When sending us your application please include a [GSoC] prefix in your e-mail subject.

Project Ideas

Texture / Image Support in alpaka

The alpaka library is a header-only C++14 abstraction library for accelerator development. Its aim is to provide performance portability across accelerators through the abstraction of the underlying levels of parallelism and acceleration technologies (backends).

Alpaka currently supports buffers across all its backends. These buffers are typed n-dimensional arrays and provide general storage for data. Buffer access is done via linear indices and retrieves the values stored at the corresponding location.

Some of the backends also allow access to texture hardware supporting buffers with additional imaging related functionality. These backends are CUDA, HIP and SYCL. Such kinds of buffers are called images. In addition to index-based access like normal buffers, images also allow interpolated or sampled access.

We would like to add support for images in alpaka, which is either implemented using the backend’s native APIs or emulated using a handcrafted image implementation for backends without image support. This emulation primarily concerns CPU backends such as OpenMP.

The following tasks will need to be addressed in this work:

  • A generic concept of what functionality for an image in alpaka needs to be available. This should be the intersection of what CUDA, HIP and SYCL offer. This functionality should be described as a set of API functions that will work on such images. The concept API should be as similar/familiar as the backend APIs as possible. It should also nicely fit into the existing design of alpaka’s buffers.
  • An implementation of the image API for each backend with native image/texture support, delegating to the API of the backend acceleration technology.
  • A fallback implementation of the image API for backends without dedicated image/texture support. Such an implementation should be built on top of alpaka buffers. Interpolated/sampled texture access should aim to be as fast/efficient as possible, making the fallback implementation feasible to use and not just compile.

There is currently an effort to abstract access to alpaka buffers via accessors. These accessors allow better access to buffers of higher dimension, because they handle offset computation and pitched allocations. Furthermore, they can hide address space qualification for the SYCL backend. It is likely that this work finishes before the start of the GSoC project. An integration of the images with alpaka accessors would be great as well and should allow to swap out a buffer for an image under an accessor without having to change the code written against an accessor. Dedicated image accessors will of course offer richer interfaces than plain buffer accessors.

Empirical IO Configuration in openPMD

Heading towards the Exascale era, state-of-the-art scientific simulations will generally produce data more rapidly than the IO systems can process them. IO efficiency is a highly volatile number, depending on many workflow specifics, including

  1. the hardware being used,
  2. the software being used,
  3. the mapping of tasks to hardware,
  4. parallel scaling and
  5. the accuracy requirements (list not exhaustive).

An important necessity for contemporary IO routines in simulations is hence the flexibility to dynamically adapt to the current requirements. The particle-on-cell code PIConGPU uses the openPMD API in order to meet these concerns: The openPMD API provides a generic high-level description of simulation data data, while at the same time allowing for an adaptable choice of implementation.

This increased configuration space raises new challenges: Efficiency of a configuration can be hard to predict in theory, and empirical measurements are generally necessary for exact knowledge of a setup’s properties. A configuration that brings a speedup on one system, might incur a slowdown on another one. As an important example for this, one approach at applying the underused compute capacities waiting for the IO system to catch up is by on-the-fly compression of data. Next to a better utilization of available resources, this can also benefit the IO efficiency by reducing the amount of data. But compression gives way to two competing effects: The increased compute demands should not outweigh the anticipated increase in IO efficiency. Also, properties of lossy compression depend heavily on the kind of data being written.

Manually configuring the IO system to be as efficient as possible under given constraints is hence no longer feasible. This project will create a new Python library, based on the openPMD API, that allows to process a sample simulation dump (spanning several single datasets) under a given set of possible configurations. The project will evaluate the configurations on a per-dataset level, based on the criteria of compression throughput, IO efficiency and precision loss under compression. The user should be presented with a clear visual overview of these results. An easily-approachable frontend to the library (preferrably Jupyter widgets) may be implemented on top of the library.

API Redesign of the Performance-Portable Primitive API Vikunja

Nowadays computer systems are accelerated by co-processors such GPUs or FPGAs. To offload an application to the co-processor a library provided by the vendor is necessary. However, the individual library interfaces are not standardized and can be very different from each other. To avoid rewriting big parts of applications for new target systems we have developed alpaka, a header-only C++14 abstraction library for accelerator development. Its aim is to provide performance portability across accelerators through the abstraction of the underlying levels of parallelism and acceleration technologies (backends). To accelerate application development and porting we developed the library vikunja on top of alpaka which provides primitives such as map and reduce.

The current API design of vikunja is similar to the algorithms of the C++ standard library which allows for easy porting of existing applications. However, the API does currently not provide enough possibilities to write performant portable code. For example, on GPUs the explicit use of shared memory is very important for good performance which the current API does not allow. To keep the advantage of the simple C++ standard library-like API we decided to keep the current API and additionally develop another, better optimizable API. This allows us to provide a workflow where the application can be easily ported using the current API and then the primitives can be optimized using the new API.

Your tasks will be:

  • Designing the new API together with us. We provide a lot of knowledge, prototypes and different implementations in existing applications to support the development. Your task is to review the existing ideas, develop and explore new ideas and ultimately create a new clean and user friendly API.
  • Implementing the API.
  • Improving the current API with the experience of the new API.

Performance-Portable Linear Algebra for the alpaka Ecosystem

The alpaka library is a header-only C++14 abstraction library for accelerator development. Its aim is to provide performance portability across accelerators through the abstraction of the underlying levels of parallelism. The same user code written against alpaka’s API can be compiled for different parallel computing architectures, CPUs and GPUs.

The alpaka ecosystem itself does not provide a high-level linear algebra API, though many scientific software projects would benefit from having one. At the same time, very few libraries provide performance-portable linear algebra solutions.

We would like to add a clean high-level C++ API for linear algebra built on top of alpaka. This API would be implemented in terms of lazy-evaluated expression templates. The algorithms themselves need not be written from scratch, though. The library should wrap existing linear algebra libraries, selected at compile time for the appropriate architecture, and provide unified architecture-independent user-facing API.


  • Design and implement a unified user-facing C++ API.
  • Implement the wrappers for C or C++ linear algebra libraries of your choice.
  • Write an expression template engine for lazy-evaluated vector/matrix operations, or adapt an existing one to suit alpaka’s API.

Large-Scale Physics-Informed Neural Networks

Solving partial differential equations (PDE) is an indispensable part of many branches of natural sciences as many processes can be modelled in terms of PDEs. However, recent numerical solvers require manual discretization of the underlying equation as well as sophisticated, tailored code for distributed computing. Scanning the parameters of the underlying model significantly increases the runtime as the simulations have to be cold-started for each parameter configuration. Machine Learning based surrogate models denote promising ways for learning complex relationships among input, parameter and solution. However, recent generative neural networks require lots of training data, i.e. full simulation runs making them costly. We tackle any of these challenges by our Neural Solvers library providing continuous, mesh-free neural solvers for partial differential equations. These equations are solved by physics-informed neural networks (PINNs) solely requiring initial/boundary values and validation coordinates for training but no simulation data.

A major challenge of PINN used to be that the parameters of the Physics-informed neural network increases exponentially with the size of the computational domain. This leads to a large memory footprint which can quickly exceed the capacities of a single GPU. Fortunately, our Neural Solvers library tackles this challenge by introducing a mixture-of-experts approach into the PINN architecture, called Gated-PINN, meaning that one big Physics-informed neural network is decomposed into several smaller subnetwork (experts) that are responsible for a certain subdomain of the whole computational domain. Our library currently scales well on large cluster systems just by taking leverage on implicit data parallelism using Horovod. Unfortunately, we still need to store the parameters of all experts on all GPUs. You will be helping us to tackle this issue in order to form very large Gated-PINN amounting to billions of parameters in total. For this purpose, you will be introducing model parallelism mechanisms into the Neural Solvers library along with dispatching mechanisms for adaptive model- and data distribution among all workers.

These tasks will be addressed in this project:

  • Design and implementation of adaptive data sampling mechanisms.
  • Integration of model parallelism into Neural Solvers library using Horovod and PyTorch.
  • Comprehensive scale-up- and speedup analysis on very large cluster systems (>60 GPUs)

Reinforced deepFibreTracking

Diffusion-weighted MRI (DWI) is a novel imaging technique based on measuring water diffusion in tissue. In contrast to free diffusion, the diffusion of water molecules can be seen as Brownian motion as the mobility of these molecules depends on, among others, tissue structure and perfusion. One of the main but most challenging applications of DWI is reconstruction of the brain’s nerve tracts (Tractography) promising novel insights into brain connectivity as well as psychiatric disorders.

deepFibreTracking is an open-source library integrating all means for reproducible research on data-driven tractography. The library offers numerical methods for fibre tracking such as Diffusion Tensor Imaging and Constrained Spherical Deconvolution as well as neural network based approaches such as feedforward networks as well as reinforcement learning agents and environments. The library provides code for implementation of full tractography workflow ranging from DWI data loading and preprocessing, fibre tracking to validation based on curated reference datasets (e.g. ISMRM2015). This project aims at very orthogonal improvements of our library ranging from improving code quality & documentation, parallelisation of preprocessing code on GPUs to ML-driven tracking mechanisms based on invertible neural networks and reinforcement learning.

Any of these tasks can be addressed in this project:

  1. Parallelisation of data preprocessing and trilinear interpolation codes on GPU using PyTorch.
  2. Routines for automatic curation of training data and enhanced validation based on neuroanatomical atlases.
  3. Extension of RL environment and agents by neuroanatomically justified reward function.
  4. Implementation of invertible neural networks for fibre tracking.
  5. (online) visualisation using