CASUS Institute Seminar, Matthew Leinhauser, University of Delaware

Due to the recent announcements of new exascale supercomputers using AMD GPUs, many scientific application developers are working to make their applications compatible with AMD (CPU-GPU) architectures, which means moving away from the traditional CPU and NVIDIA-GPU systems. However, with the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this talk, we will go over the metrics and profiling tools needed to create an instruction roofline model for AMD GPUs. Specifically, we use AMD’s ROCProfiler and the HIP implementation of a benchmarking tool, BabelStream, as a way to measure an application’s performance in instructions and memory transactions on new AMD hardware. We create instruction roofline models for a case study scientific application, PIConGPU, an open sourceparticle-in-cell (PIC) simulations application used for plasma and laser-plasma physics on the NVIDIA V100, AMD Radeon Instinct MI60, and AMD Instinct MI100 GPUs. When looking at the performance of multiple kernels of interest in PIConGPU, we find that although the AMD MI100 GPU achieves a similar, or better, execution time compared to the other GPUs for the kernels measured, the NVIDIA V100 GPU can achieves a higher amount of billions of instructions per second (GIPS) than the AMD MI100. When looking at execution time, GIPS, and instruction intensity, the AMD MI60 achieves the worst performance out of the three GPUs used in this work.