CASUS Institute Seminar, Daniel Coquelin, Karlsruhe Institute of Technology (KIT), Steinbuch Centre for Computing (SCC)

Daniel is an AI Consultant in the Helmholtz AI Local Energy Consulting Team at KIT’s SCC led by Marcus Götz. He works on the optimization of computing and communication operations for AI algorithms on high-performance computers. Daniel’s focus is mainly on data and model parallel neural networks.

Abstract of the talk // As datasets and neural networks grow, their training becomes more and more difficult. Current state-of-the-art networks are trained on tera- or petabytes of data and hundreds of GPUs. To achieve this, Daniel and his colleagues use a multitude of methods. In this talk he will discuss the most common form of neural network parallelism: batch parallelism. This method replicates the network on each accelerator and splits the data between them, after each batch the models are synchronized using the data that each of them found. While this remains trivial at small scales, the process becomes much more challenging when many networks are attempting to aggregate what the have found. After this background he will introduce the DASO method, a hierarchical method of merging these networks.

The event is organized in a hybrid format that includes a videoconferencing tool by Zoom Inc. If you want to join the talk remotely, please ask for the login details via