CASUS Hands-On Software Seminar, Bernhard Manfred Gruber, CERN

N-body simulations approximate the interaction of N point shaped bodies over time by computing the bodies’ pairwise interactions in a time-stepped fashion. Such kind of simulations are found in a variety of domains to describe galaxy systems, protein folding, fluid flows and even illumination in video games. N-body simulations are computationally expensive and their runtime scales quadratically with the number of bodies. This makes them interesting to optimize. This talk will demonstrate how memory layouts like Array of Struct, Struct of Arrays and Array of Struct of Arrays impact the runtime of such simulations, without functional changes of the algorithm. We will look into the memory hierarchies of CPUs and GPUs and their specific needs and investigate the generated assembly code. Memory layouts provided by the LLAMA library are contrasted with hand-written versions. For CPU versions, we will inspect vectorization, automatic and manual. For the GPU we will show how the use of shared memory as cache impacts the memory system. Finally, with all these tricks depending on the target hardware, can we have a single code that handles all these aspects? Probably 😉, but that’s a topic for another seminar! This talk will not discuss tree and particle mesh methods for n-body simulations, where groups of bodies are approximated for speeding up computation. We will just look at an all-pairs version.