CASUS Institute Seminar by Dr. Jakob Blomer, European Laboratory for Particle Physics (CERN): Blomer is a staff member in the scientific software group at CERN in Geneva. He received a PhD in computer science from the Technical University of Munich in 2012. Afterwards he was a Marie-Curie fellow at CERN and a visiting scholar at the RAMCloud research group at Stanford University. Jakob works on distributed systems and storage software. He created the CernVM File System, which he evolves ever since. In the ROOT team, Jakob leads the R&D project on a new columnar data format high-energy physics collision data.

Researchers in High Energy Physics (HEP), at CERN and worldwide, need to efficiently analyze petabytes of data. Data sets at particle colliders are analysed with statistical methods comparing theoretical expectations of certain, often very rare physics processes with recorded data from particle detectors. As the number of particles produced in each and every collision is a priori unknown, HEP data is not naturally tabular but instead modeled by more complex hierarchical collections. This presentation introduces the HEP computing model and the main analysis toolkit, called ROOT, that helps with interactive development of analysis algorithms, serialization of virtually any C++ object, fast statistical and mathematical algorithms, and plotting tools for publications. A key aspect is the efficient storage of data that should facilitate short turn-around times for searching, filtering, and the calculation of derived physics quantities. To this end, the presentation discusses ROOT’s storage techniques and considerations for finding a sweet spot given often contradicting requirements: low storage consumption, high data access rates, and ergonomic data handling interfaces for researchers.