Create new resource entry (for logged in users)
Title | Authors | Type | Date | Abstract | Tags | Relevant for |
---|---|---|---|---|---|---|
Simulating the Carbon Cost of Grid Sites | David Britton | Presentation |
| We present first results from a new simulation of the WLCG Glasgow Tier-2 site, designed to investigate the potential for reducing our carbon footprint by reducing the CPU clock frequency across the site in response to a higher-than-normal fossil-fuel component in the local power supply. The simulation uses real (but historical) data for the UK power-mix, together with measurements of power consumption made at Glasgow on a variety of machines, and is designed to provide a tool to inform future procurements and the operation of sites. The output of the simulation, combined with considerations of embedded carbon, can also be used to inform and optimise the policy for replacing older hardware with more energy efficient devices. The rate of transition to more energy efficient hardware must be balanced against the embedded carbon in the manufacture of new machines, and frequency modulation must be balanced against both the loss of site throughput and the accounting of embedded carbon. Frequency modulation can also be used to reduce power requirements to address short-term supply issues, irrespective of the carbon content. | D3.1 | |
ARMing HEP for the future Energy Efficiency of WLCG sites (ARM vs. x86) | Emanuele Simili, Gordon Stewart, Samuel Skipsey, Dwayne Spiteri, Albert Borbely, David Britton | Conference Paper |
| We present a case for ARM chips as an alternative to standard x86 at WLCG sites to help reduce power consumption. New measurements are presented on the performance and energy consumption of two machines (one ARM and one x86), that were otherwise similar in specification and cost. The comparison was extended to a dual socket x86 node, representative of our site. These new results include the energy-efficiency and speed of singleand multithreaded jobs; the effect of hyper-threading; and an initial look at clock throttling as a way of shaping power-load. We observe significantly lower power consumption and often slightly better performance on the ARM machine and, noting the increased availability of ARM software builds from all LHC experiments and beyond, we plan to install a 2k-core ARM cluster at our WLCG Tier2 site at Glasgow in the summer of 2023. This will enable testing, physicsvalidation, and eventually an ARM production environment that will inform and influence other WLCG sites in the UK and worldwide. | D3.1 | |
A holistic study of the WLCG energy needs for the LHC scientific program | David Britton, Simone Campana, Bernd Panzer-Stradel | Conference Paper |
| The WLCG infrastructure provides the compute power and storage capacity needed by the Large Hadron Collider (LHC) experiments at CERN. The infrastructure is distributed across over 170 data centres in more than 40 countries. The amount of energy consumed by the WLCG to support the scientific program of the LHC experiments, and its evolution, depends on different factors: the luminosity of the LHC and its operating conditions; the data volume and the data complexity; the evolving computing models and the offline software of the experiments; the ongoing R&D program in preparation for the next LHC phase (HL-LHC); the evolution of computing hardware technology towards better energy efficiency; and the modernization of the facilities hosting the data centres to improve Power Usage Effectiveness. This contribution presents a study of the WLCG energy needs and their potential evolution during the future LHC program based on the factors mentioned above. Some of the information is obtained from the CERN experience but then extrapolated to the whole of WLCG. The study provides, therefore, a holistic view for the infrastructure rather than a detailed prediction at the level of the individual facilities. It presents a clear view of the trends and offers a model for more refined studies. | D3.1 | |
Explorations of the viability of ARM and Xeon Phi for physics processing | David Abdurachmanov, Kapil Arya, Josh Bendavid, Tommaso Boccali, Gene Cooperman, Andrea Dotti, Peter Elmer, Giulio Eulisse, Francesco Giacomini, Christopher D. Jones, Matteo Manzali, Shahzad Muzaffar | Publication |
| We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing. | N/A | |
Computing models in high energy physics | Tommaso Boccali | Publication |
| High Energy Physics Experiments (HEP experiments in the following) have been at least in the last 3 decades at the forefront of technology, in aspects like detector design and construction, number of collaborators, and complexity of data analyses. As uncommon in previous particle physics experiments, the computing and data handling aspects have not been marginal in their design and operations; the cost of the IT related components, from software development to storage systems and to distributed complex e-Infrastructures, has raised to a level which needs proper understanding and planning from the first moments in the lifetime of an experiment. In the following sections we will first try to explore the computing and software solutions developed and operated in the most relevant past and present experiments, with a focus on the technologies deployed; a technology tracking section is presented in order to pave the way to possible solutions for next decade experiments, and beyond. While the focus of this review is on offline computing model, the distinction is a shady one, and some experiments have already experienced contaminations between triggers selection and offline workflows; it is anticipated the trend will continue in the future. | N/A | |
Financial Case Study: Use of Cloud Resources in HEP Computing | Christopher Hollowell, Jerome Lauret, Shigeki Misawa, Tejas Rao, Alexandr Zaytsev | Presentation |
| N/A | N/A | |
HEP/HPC Strategy Meeting - USA | Ian Fisk, Maria Girone, Oliver Gutsche, Paolo Calafiura | Meeting Summary | and | N/A | N/A | |
HEP/HPC Strategy Meeting - Europe | Andrej Filipcic, Maria Girone, Tommaso Boccali | Meeting Summary | and | Discuss the current status and challenges of HPC integration in Europe (combining conclusions with other regions). | N/A | |
HEP/HPC Strategy Meeting - All Regions | Maria Girone, Tommaso Boccali | Meeting Summary | and | Meeting to discuss the current status and challenges of HPC integration in all regions. [...] | N/A | |
Dark-matter And Neutrino Computation Explored (DANCE) Community Input to Snowmass | Amy Roberts, Christopher Tunnell, Belina von Krosigk, et al. | Meeting Summary |
| This paper summarizes the needs of the dark matter and neutrino communities as it relates to computation. The scope includes data acquisition, triggers, data management and processing, data preservation, simulation, machine learning, data analysis, software engineering, career development, and equity and inclusion. Beyond identifying our community needs, we propose actions that can be taken to strengthen this community and to work together to overcome common challenges. | JENA_WG2 | |
Resource-aware Research on Universe and Matter: Call-to-Action in Digital Transformation | Ben Bruers, Marilyn Cruces, Markus Demleitner, Guenter Duckeck, Michael Düren, Niclas Eich, Torsten Enßlin, Johannes Erdmann, Martin Erdmann, Peter Fackeldey, Christian Felder, Benjamin Fischer, Stefan Fröse, Stefan Funk, Martin Gasthuber, Andrew Grimshaw, Daniela Hadasch, Moritz Hannemann, Alexander Kappes, Raphael Kleinemühl, Oleksiy M. Kozlov, Thomas Kuhr, Michael Lupberger, Simon Neuhaus, Pardis Niknejadi, Judith Reindl, Daniel Schindler, Astrid Schneidewind, Frank Schreiber, Markus Schumacher, Kilian Schwarz, Achim Streit, R. Florian von Cube, Rod Walker, Cyrus Walther, Sebastian Wozniewski, Kai Zhou | Meeting Summary |
| Given the urgency to reduce fossil fuel energy production to make climate tipping points less likely, we call for resource-aware knowledge gain in the research areas on Universe and Matter with emphasis on the digital transformation. A portfolio of measures is described in detail and then summarized according to the timescales required for their implementation. The measures will both contribute to sustainable research and accelerate scientific progress through increased awareness of resource usage. This work is based on a three-days workshop on sustainability in digital transformation held in May 2023. | JENA_WG2 | |
Software and Computing for Small HEP Experiments | Dave Casper, Maria Elena Monzani, Benjamin Nachman | Meeting Summary |
| This white paper briefly summarized key conclusions of the recent US Community Study on the Future of Particle Physics (Snowmass 2021) workshop on Software and Computing for Small High Energy Physics Experiments. | JENA_WG2 | |
NuPECC TWG9 : Open Science and Data | TWG9 Open Science and Data | Publication |
| […] This chapter discusses the benefits and application of Open Science within the community, and explores the current and future perspectives for the community. This is divided into the several “pillars” of Open Science, namely: Open Science developments, Open Access publications, Open Data and lifecycle, Open Software and workflows, Infrastructures for Open Science, and Nuclear data evaluation. | JENA_WG2 | |
GANIL Data Policy | GANIL | Technical Report |
| The present data management policy pertains to the ownership of, the curation of and access to experimental data and metadata collected and/or stored at GANIL. […] | JENA_WG2 | |
Instructions for uploading and linking research data/software at GSI Helmholtzzentrum für Schwerionenforschung GmbH | GSI/FAIR Collaboration | Publication |
| This document details how to publish data and software to an external repository, make records in the JOIN2 GSI publications repository (https://repository.gsi.de/), and subsequently link them together with the publication record. | JENA_WG2 | |
Open source software licences at GSI/FAIR - Guidelines | GSI/FAIR Collaboration | Publication |
| With the advancing digitization of research and teaching, the number of scientific software solutions is constantly increasing. In general, scientific software should be released as freely as possible on a trusted infrastructure, if there is no exploitation option. Adequate embargo periods may apply under certain conditions, such as termination of theses and publications, as well as maintaining competitive advantage. These guidelines are based on the recommendations formulated by the task force on Open Source licensing at CERN. | JENA_WG2 | |
DUNE Offline Computing Conceptual Design Report | The DUNE Collaboration | Technical Report |
| This document describes the conceptual design for the Offline Software and Computing for the Deep Underground Neutrino Experiment (DUNE). The goals of the experiment include 1) studying neutrino oscillations using a beam of neutrinos sent from Fermilab in Illinois to the Sanford Underground Research Facility (SURF) in Lead, South Dakota, 2) studying astrophysical neutrino sources and rare processes and 3) understanding the physics of neutrino interactions in matter. We describe the development of the computing infrastructure needed to achieve the physics goals of the experiment by storing, cataloging, reconstructing, simulating, and analyzing | JENA_WG2 | |
The O2 software framework and GPU usage in ALICE online and offline reconstruction in Run 3 | David Rohr, Giulio Eulisse | Presentation |
| […] The talk will present the experience from running the O2 framework in production during the 2022 ALICE data taking, with particular regard to the GPU usage, an overview of the current state and the plans for the asynchronous reconstruction, and the current performance of synchronous and asynchronous reconstruction with GPUs for pp and Pb-Pb data. | JENA_WG2 | |
Physics Briefing Book : Input for the European Strategy for Particle Physics Update 2020 | European Strategy for Particle Physics | Strategy Report |
| The European Particle Physics Strategy Update (EPPSU) process takes a bottom-up approach, whereby the community is first invited to submit proposals (also called inputs) for projects that it would like to see realised in the near-term, mid-term and longer-term future. National inputs as well as inputs from National Laboratories are also an important element of the process. All these inputs are then reviewed by the Physics Preparatory Group (PPG), whose role is to organize a Symposium around the submitted ideas and to prepare a community discussion on the importance and merits of the various proposals. The results of these discussions are then concisely summarised in this Briefing Book, prepared by the Conveners, assisted by Scientific Secretaries, and with further contributions provided by the Contributors listed on the title page. This constitutes the basis for the considerations of the European Strategy Group (ESG), consisting of scientific delegates from CERN Member States, Associate Member States, directors of major European laboratories, representatives of various European organizations as well as invitees from outside the European Community. The ESG has the mission to formulate the European Strategy Update for the consideration and approval of the CERN Council. | JENA_WG2 | |
HL-LHC Computing Review: Common Tools and Community Software | HEP Software Foundation | Technical report |
| Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC’s success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. The General Purpose Detectors describe their specific challenges elsewhere; in this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful. […] | JENA_WG2 | |
The Future of High Energy Physics Software and Computing: Report of the 2021 US Community Study on the Future of Particle Physics | V. Daniel Elvira, Steven Gottlieb, Oliver Gutsche, Benjamin Nachman | Publication |
| Software and Computing (S&C) are essential to all High Energy Physics (HEP) experiments and many theoretical studies. The size and complexity of S&C are now commensurate with that of experimental instruments, playing a critical role in experimental design, data acquisition/instrumental control, reconstruction, and analysis. Furthermore, S&C often plays a leading role in driving the precision of theoretical calculations and simulations. Within this central role in HEP, S&C has been immensely successful over the last decade. This report looks forward to the next decade and beyond, in the context of the 2021 Particle Physics Community Planning Exercise ("Snowmass") organized by the Division of Particles and Fields (DPF) of the American Physical Society. | WG5, JENA_WG2 | |
CMS Phase-2 Computing Model: Update | CMS Offline Software and Computing | Publication |
| The Phase-2 upgrade of CMS, coupled with the projected performance of the HL-LHC, shows great promise in terms of discovery potential. However, the increased granularity of the CMS detector and the higher complexity of the collision events generated by the accelerator pose challenges in the areas of data acquisition, processing, simulation, and analysis. These challenges cannot be solved solely by increments in the computing resources available to CMS, but must be accompanied by major improvements of the computing model and computing software tools, as well as data processing software and common software tools. In this document we present aspects of our roadmap for those improvements, focusing on the plans to reduce storage and CPU needs as well as take advantage of heterogeneous platforms, such as the ones equipped with GPUs, and High Performance Computing Centers. We describe the most prominent research and development activities being carried out in the experiment, demonstrating their potential effectiveness in either mitigating risks or quantitatively reducing computing resource needs on the road to the HL-LHC. | WG1, JENA_WG2 | |
ATLAS Software and Computing HL-LHC Roadmap | ATLAS Collaboration | Publication |
| […] ATLAS produced a Conceptual Design Report (CDR) for HL-LHC Computing during the spring of 2020 for an initial review by the LHCC. The CDR laid out the issues discussed above, and the general approaches that will be taken to address them. This new document serves as a software-focused update to the first CDR, providing more concrete information on development work that will be undertaken in the coming years, listing specific milestones and target dates. Additionally, the document describes how ATLAS collaborates with external activities and projects, and how such collaboration will impact the overall development for HL-LHC. | JENA_WG2 | |
Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics | Mohammad Atif, Meghna Battacharya, Paolo Calafiura, et al. | Publication |
| […] The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration. | JENA_WG2 | |
Snowmass 2021 Cross Frontier Report: Dark Matter Complementarity (Extended Version) | Antonio Boveia, Mohamed Berkat, Thomas Y. Chen, Aman Desai, Caterina Doglioni, et al. | Publication |
| The fundamental nature of Dark Matter is a central theme of the Snowmass 2021 process, extending across all frontiers. In the last decade, advances in detector technology, analysis techniques and theoretical modeling have enabled a new generation of experiments and searches while broadening the types of candidates we can pursue. Over the next decade, there is great potential for discoveries that would transform our understanding of dark matter. In the following, we outline a road map for discovery developed in collaboration among the frontiers. A strong portfolio of experiments that delves deep, searches wide, and harnesses the complementarity between techniques is key to tackling this complicated problem, requiring expertise, results, and planning from all Frontiers of the Snowmass 2021 process. | JENA_WG2 | |
Snowmass2021 Cosmic Frontier: Modeling, statistics, simulations, and computing needs for direct dark matter detection | Yonatan Kahn, Maria Elena Monzani, Kimberly J. Palladino et al. | Publication |
| This paper summarizes the modeling, statistics, simulation, and computing needs of direct dark matter detection experiments in the next decade. | JENA_WG2 | |
The IceProd Framework: Distributed Data Processing for the IceCube Neutrino Observatory | M. G. Aartsen, R. Abbasi, M. Ackermann, et al. | Publication |
| IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, identify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. This paper presents the first detailed description of IceProd, a lightweight distributed management system designed to meet these requirements. It is driven by a central database in order to manage mass production of simulations and analysis of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, HTCondor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework | JENA_WG2 | |
KM3NeT Report on Documentation Strategy, Environment, and Software | T. Gal, J. Hofestädt, U. Katz, J. Schnabel. | Deliverable |
| The KM3NeT Research Infrastructure will, over a period of at least a decade, produce a large amount of unique scientific data that are to be made available to the scientific communities concerned and to the broader general public. This requires the set-up of tools, procedures, documentation and rules to provide this service. For all aspects of the open data access system, including data processing methods, data structure, access and usage examples, sufficient documentation for the effective use of the open data must be provided. In this document, the documentation strategy for the different components is described. | JENA_WG2 | |
Computing Challenges for the Einstein Telescope Project | Stefano Bagnasco, Antonella Bozzi, Tassos Fragos, Alba Gonzalvez, Steffen Hahn, Gary Hemming, Lia Lavezzi, Paul Laycock, Gonzalo Merino, Silvio Pardi, Steven Schramm, Achim Stahl, Andres Tanasijczuk, Nadia Tonello, Sara Vallero, John Veitch, Patrice Verdier | Conference paper |
| The discovery of gravitational waves, first observed in September 2015 following the merger of a binary black hole system, has already revolutionised our understanding of the Universe. This was further enhanced in August 2017, when the coalescence of a binary neutron star system was observed both with gravitational waves and a variety of electromagnetic counterparts; this joint observation marked the beginning of gravitational multimessenger astronomy. The Einstein Telescope, a proposed next-generation ground-based gravitational-wave observatory, will dramatically increase the sensitivity to sources: the number of observations of gravitational waves is expected to increase from roughly 100 per year to roughly 100’000 per year, and signals may be visible for hours at a time, given the low frequency cutoff of the planned instrument. This increase in the number of observed events, and the duration with which they are observed, is hugely beneficial to the scientific goals of the community but poses a number of significant computing challenges. Moreover, the currently used computing algorithms do not scale to this new environment, both in terms of the amount of resources required and the speed with which each signal must be characterised. This contribution will discuss the Einstein Telescope's computing challenges, and the activities that are underway to prepare for them. Available computing resources and technologies will greatly evolve in the years ahead, and those working to develop the Einstein Telescope data analysis algorithms will need to take this into account. It will also be important to factor into the initial development of the experiment's computing model the availability of huge parallel HPC systems and ubiquitous Cloud computing; the design of the model will also, for the first time, include the environmental impact as one of the optimisation metrics. | WG1, JENA_WG2 | |
Gravitational-Wave Data Analysis: Computing Challenges in the 3G Era | Peter Couvares, Ian Bird, Ed Porter, Stefano Bagnasco, Michele Punturo, David Reitze, Stavros Katsanevas, Takaaki Kajita, Vicky Kalogera, Harald Lueck, David McClelland, Sheila Rowan, Gary Sanders, B.S. Sathyaprakash, David Shoemaker, Jo van den Brand | Technical Report |
| Cyber infrastructure will be a critical consideration in the development of next generation gravitational-wave detectors. The demand for data analysis computing in the 3G era will be driven by the high number of detections as well as the expanded search parameter space for compact astrophysical objects and the subsequent parameter estimation follow-up required to extract the nature of the sources. Additionally, there will be an increased need to develop appropriate and scalable computing cyberinfrastructure, including data access and transfer protocols, and storage and management of software tools, that have sustainable development, support, and management processes. This report identifies the major challenges and opportunities facing 3G gravitational-wave observatories and presents recommendations for addressing them. | JENA_WG2 | |
Environmental sustainability in basic research: a perspective from HECAP+ | Shankha Banerjee, Thomas Y. Chen, Claire David, Michael Düren, Harold Erbin, Jacopo Ghiglieri, Mandeep S. S. Gill, L Glaser, Christian Gütschow, Jack Joseph Hall, Johannes Hampp, Patrick Koppenburg, Matthias Koschnitzke, Kristin Lohwasser, Rakhi Mahbubani, Viraf Mehta, Peter Millington, Ayan Paul, Frauke Poblotzki, Karolos Potamianos, Nikolina Šarčević, Rajeev Singh, Hannah Wakeling, Rodney Walker, Matthijs van der Wild, Pia Zurita | Technical report |
| The climate crisis and the degradation of the world's ecosystems require humanity to take immediate action. The international scientific community has a responsibility to limit the negative environmental impacts of basic research. The HECAP+ communities (High Energy Physics, Cosmology, Astroparticle Physics, and Hadron and Nuclear Physics) make use of common and similar experimental infrastructure, such as accelerators and observatories, and rely similarly on the processing of big data. Our communities therefore face similar challenges to improving the sustainability of our research. This document aims to reflect on the environmental impacts of our work practices and research infrastructure, to highlight best practice, to make recommendations for positive changes, and to identify the opportunities and challenges that such changes present for wider aspects of social responsibility. | JENA_WG2 | |
Interactive Analysis Notebooks on DESY Batch Resources: Bringing Jupyter to HTCondor and Maxwell at DESY | J. Reppin, C. Beyer, T. Hartmann, F. Schluenzen, M. Flemming, S. Sternberger, Y. Kemp | Publication |
| Batch scheduling systems are usually designed to maximise fair resource utilisation and efficiency, but are less well designed for demanding interactive processing, which requires fast access to resources while low upstart latency is only of secondary significance for high throughput of high performance computing scheduling systems. The computing clusters at DESY are intended as batch systems for end users to run massive analysis and simulation jobs enabling fast turnaround systems, in particular when processing is expected to feed back to operation of instruments in near real-time. The continuously increasing popularity of Jupyter Notebooks for interactive and online processing made an integration of this technology into the DESY batch systems indispensable. We present here our approach to utilise the HTCondor and SLURM backends to integrate Jupyter Notebook servers and the techniques involved to provide fast access. The chosen approach offers a smooth user experience allowing users to customize resource allocation tailored to their computational requirements. In addition, we outline the differences between the HPC and the HTC implementations and give an overview of the experience of running Jupyter Notebook services. | JENA_WG1 | |
Beyond HEP: Photon and accelerator science computing infrastructure at DESY | Christoph Beyer, Stefan Bujack, Stefan Dietrich, Thomas Finnern, Martin Flemming, Patrick Fuhrmann, Martin Gasthuber, Andreas Gellrich, Volker Guelzow, Thomas Hartmann, Johannes Reppin, Yves Kemp, Birgit Lewendel, Frank Schluenzen, Michael Schuh, Sven Sternberger, Christian Voss, Markus Wengert | Conference Paper |
| [...] We will present DESY compute cloud and container orchestration plans as a basis for infrastructure and platform services. We will show examples of Jupyter notebooks for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters. To overcome the fragmentation of the various resources for all scientific communities at DESY, we explore how to integrate them into a seamless user experience in an Interdisciplinary Data Analysis Facility. | JENA_WG1 | |
Effective Dynamic Integration and Utilization of Heterogenous Compute Resources | Max Fischer, Manuel Giffels, Andreas Heiss, Eileen Kuehn, Matthias Schnepf, Ralf Florian von Cube, Andreas Petzold, Günter Quast | Conference Paper |
| Increased operational effectiveness and the dynamic integration of only temporarily available compute resources (opportunistic resources) becomes more and more important in the next decade, due to the scarcity of resources for future high energy physics experiments as well as the desired integration of cloud and high performance computing resources. This results in a more heterogenous compute environment, which gives rise to huge challenges for the computing operation teams of the experiments. At the Karlsruhe Institute of Technology (KIT) we design solutions to tackle these challenges. In order to ensure an efficient utilization of opportunistic resources and unified access to the entire infrastructure, we developed the Transparent Adaptive Resource Dynamic Integration System (TARDIS). A scalable multi-agent resource manager providing interfaces to provision as well as dynamically and transparently integrate resources of various providers into one common overlay batch system. Operational effectiveness is guaranteed by relying on COBalD – the Opportunistic Balancing Daemon and its simple approach of taking into account the utilization and allocation of the different resource types, in order to run the individual workflows on the best-suited resource respectively. In this contribution we will present the current status of integrating various HPC centers and cloud providers into the compute infrastructure at the Karlsruhe Institute of Technology as well as our experiences gained in a production environment. | JENA_WG1 | |
Lightweight dynamic integration of opportunistic resources | Max Fischer, Eileen Kuehn, Manuel Giffels, Matthias Jochen Schnepf, Andreas Petzold, Andreas Heiss | Conference Paper |
| To satisfy future computing demands of the Worldwide LHC Computing Grid (WLCG), opportunistic usage of third-party resources is a promising approach. While the means to make such resources compatible with WLCG requirements are largely satisfied by virtual machines and containers technologies, strategies to acquire and disband many resources from many providers are still a focus of current research. Existing meta-schedulers that manage resources in the WLCG are hitting the limits of their design when tasked to manage heterogeneous resources from many diverse resource providers. To provide opportunistic resources to the WLCG as part of a regular WLCG site, we propose a new meta-scheduling approach suitable for opportunistic, heterogeneous resource provisioning. Instead of anticipating future resource requirements, our approach observes resource usage and promotes well-used resources. Following this approach, we have developed an inherently robust meta-scheduler, COBalD, for managing diverse, heterogeneous resources given unpredictable resource requirements. This paper explains the key concepts of our approach, and discusses the benefits and limitations of our new approach to dynamic resource provisioning compared to previous approaches. | JENA_WG1 | |
Transparent Integration of Opportunistic Resources into the WLCG Compute Infrastructure | Michael Böhler, René Caspart, Max Fischer, Oliver Freyermuth, Manuel Giffels, Stefan Kroboth, Eileen Kuehn, Matthias Schnepf, Florian von Cube, Peter Wienemann | Conference Paper |
| The inclusion of opportunistic resources, for example from High Performance Computing (HPC) centers or cloud providers, is an important contribution to bridging the gap between existing resources and future needs by the LHC collaborations, especially for the HL-LHC era. However, the integration of these resources poses new challenges and often needs to happen in a highly dynamic manner. To enable an effective and lightweight integration of these resources, the tools COBalD and TARDIS are developed at KIT. In this contribution we report on the infrastructure we use to dynamically offer opportunistic resources to collaborations in the World Wide LHC Computing Grid (WLCG). The core components are COBalD/TARDIS, HTCondor, CVMFS and modern virtualization technology. The challenging task of managing the opportunistic resources is performed by COBalD/TARDIS. We showcase the challenges, employed solutions and experiences gained with the provisioning of opportunistic resources from several resource providers like university clusters, HPC centers and cloud setups in a multi VO environment. This work can serve as a blueprint for approaching the provisioning of resources from other resource providers. | JENA_WG1 | |
Extending the distributed computing infrastructure of the CMS experiment with HPC resources | J. Adelman-McCarthy, T. Boccali, R. Caspart, A. Delgado Peris, M. Fischer, J. Flix Molina et al. | Conference Paper |
| Particle accelerators are an important tool to study the fundamental properties of elementary particles. Currently the highest energy accelerator is the LHC at CERN, in Geneva, Switzerland. Each of its four major detectors, such as the CMS detector, produces dozens of Petabytes of data per year to be analyzed by a large international collaboration. The processing is carried out on the Worldwide LHC Computing Grid, that spans over more than 170 compute centers around the world and is used by a number of particle physics experiments. Recently the LHC experiments were encouraged to make increasing use of HPC resources. While Grid resources are homogeneous with respect to the used Grid middleware, HPC installations can be very different in their setup. In order to integrate HPC resources into the highly automatized processing setups of the CMS experiment a number of challenges need to be addressed. For processing, access to primary data and metadata as well as access to the software is required. At Grid sites all this is achieved via a number of services that are provided by each center. However at HPC sites many of these capabilities cannot be easily provided and have to be enabled in the user space or enabled by other means. At HPC centers there are often restrictions regarding network access to remote services, which is again a severe limitation. The paper discusses a number of solutions and recent experiences by the CMS experiment to include HPC resources in processing campaigns. | JENA_WG1 | |
The ALICE Collaboration: Evolution of the O2 system | The ALICE Collaboration | Technical Report |
| This document describes the evolution of the ALICE Online-Offline computing system since the TDR, published in June 2015, and gives an account of its implementation schedule in the next two years. After the LS2 upgrade, ALICE will operate at a peak Pb–Pb collision rate of 50 kHz. All events will be read out, reconstructed, compressed and written to permanent storage without any selective trigger. […] | WG1 | |
SKA1 Scientific Use Cases | J. Wagg et al. | Technical document |
| Here, we present a series of sample use cases that highlight some of the scientific objectives that could be enabled by phase 1 of the SKA (SKA1). This set consists of examples that include a broad range of scientific applications requiring the frequency coverage of SKA1-LOW and SKA1-MID ([AD3]), as well as the extended high frequency coverage that could be enabled by the advanced instrumentation programme band 6 receivers (beyond 15 GHz; [RD6]). These are intended to serve as examples, and should not be regarded as a substitute for the system level 1 requirements document. […] | WG2, WG5 | |
Future Trends in Nuclear Physics Computing | Markus Diefenthaler, Torre Wenaus | Editorial |
| […] The workshop focused on identifying the unique aspects of software and computing in NP, and discussing how the NP community could strengthen common eforts and chart a path forward for the next decade, sure to be an exciting one with rich ongoing scientifc programs at Brookhaven National Laboratory (BNL), Jeferson Lab (JLab), and other NP facilities, and culminating in datataking at the ElectronIon Collider (EIC) in the early 2030s. Without claiming to present a collective view from the workshop and discussions since—fortunately this is not expected of us in this opinion editorial—we ofer here our refections on the topic, informed by the workshop and the summary we authored with our colleagues, as well as discussions and developments in the eventful time since. […] | WG5 | |
Summary of the cross-experiment HPC workshop | Tommaso Boccali, Concezio Bozzi, James Catmore, Davide Costanzo, Markus Klute, Andrea Valassi | Workshop Summary |
| We had a workshop on HPCs on 10 May [2019] across the experiments, where we had people from ATLAS, CMS, and LHCb, with some participation from ALICE. All experiments report usage of HPC resources, with varying levels of technical difficulties. Integrating HEP experiment workloads on HPC systems poses technical issues and challenges in two distinct areas, namely the management and submission of jobs on HPCs, and the development of the software applications executed within each job: | WG5 | |
Common challenges for HPC integration into LHC computing | Maria Girone | Technical Report |
| The experiments have compiled HPC-related documents, including the summary of a joint meeting on this subject. This document intends to extract the commonalities between experiments with the aim of developing a joint roadmap and strategy for enabling the exploitation of HPC resources. To develop common approaches between experiments and HPC sites, a foundation and understanding of the problems is needed. This is built on a summary of technical challenges, described in section 2. They are broken into two main categories: computing resource challenges and software and architecture challenges. Computing resource challenges describe issues related to operations, facility access, provisioning, and monitoring; while software and architecture challenges are related to adapting HEP applications to make effective use of alternative architectures often found on HPC. In order to explore potential solutions a number of pilot demonstrators are proposed in section 3 below. | WG5 | |
Integration of the Barcelona Supercomputing Center for CMS computing: Towards large scale production | C. Acosta, A. Delgado, J. Flix, J.M. Hernández, A. Pérez-Calero, E. Pineda, I. Villalonga. | Presentation |
| N/A | WG5 | |
US ATLAS and US CMS HPC and Cloud Blueprint | Fernando Barreiro Megino, Lincoln Bryant, Dirk Hufnagel, Kenyi Hurtado Anampa | Technical report | The Large Hadron Collider (LHC) at CERN houses two general purpose detectors - ATLAS and CMS - which conduct physics programs over multi-year runs to generate increasingly precise and extensive datasets. The e fforts of the CMS and ATLAS collaborations lead to the discovery of the Higgs boson, a fundamental particle that gives mass to other particles, representing a monumental achievement in the field of particle physics that was recognized with the awarding of the Nobel Prize in Physics in 2013 to Franc¸ois Englert and Peter Higgs. These collaborations continue to analyze data from the LHC and are preparing for the high luminosity data taking phase at the end of the decade. The computing models of these detectors rely on a distributed processing grid hosted by more than 150 associated universities and laboratories worldwide. However, such new data will require a significant expansion of the existing computing infrastructure. To address this, both collaborations have been working for years on integrating High Performance Computers (HPC) and commercial cloud resources into their infrastructure and continue to assess the potential role of such resources in order to cope with the demands of the new high luminosity era. US ATLAS and US CMS computing management have charged the authors to provide a blueprint document looking at current and possibly future use of HPC and Cloud resources, outlining integration models, possibilities, challenges and costs. The document will address key questions such as the optimal use of resources for the experiments and funding agencies, the main obstacles that need to be overcome for resource adoption, and areas that require more attention. | WG1, WG5 | ||
Nuclear Physics Tools – Machine Learning, Artificial Intelligence, and Quantum Computing | Valerio Bertone, Jana N. Günther, Hervé Moutarde, Eugenio Nappi | Report |
| […] The purpose of this chapter is to provide a broad and as comprehensive as possible overview of the current status of how these techniques are being employed in nuclear physics, to coordinate this effort at a European level. […] | WG5 | |
EuroHPC Summit 2024: Interconnecting EuroHPC Supercomputers for Scientific and Industrial Advancement | EuroHPC | Presentation |
| N/A | WG5 | |
SKA Science Regional Centres Community Input Questionnaire | SKA Regional Centre Steering Committee (SRCSC) WG6, Task Package 1 | Survey |
| A key objective of the SKA Regional Centre Steering Committee (SRCSC) WG6, Task Package 1 group is to collect feedback from the future SKA user communities regarding their needs, requirements and expectations of services delivered by the future SRC Network. The outputs of this process will be fed directly into the requirement definition process for the SRC Network via the various SRCSC working groups. This will be done both at an early stage, and via periodic reviews, to ensure that the SRC Network is developed as optimally as possible to serve the communities needs and, maximise the science evolving impacts and outputs of the SKA. The SRCSC’s WG6 (TP1) are initiating a number of engagement and requirement capturing actions with the future SKA user community. These actions will take multiple forms. Initially we will be surveying different parts of the SKA user community via targeted questionnaires and engagement forums. The outputs of these actions will be then used to help to define the detailed (level 1 & 2) requirements of the SRC Network. | WG1, WG4 | |
Utilizing Distributed Heterogeneous Computing with PanDA in ATLAS | Tadashi Maeno, Aleksandr Alekseev, Fernando Harald Barreiro Megino, Kaushik De, Wen Guan, Edward Karavakis, Alexei Klimentov, Tatiana Korchuganova, FaHui Lin, Paul Nilsson, Torre Wenaus, Zhaoyu Yang, Xin Zhao | Conference Paper |
| In recent years, advanced and complex analysis workflows have gained increasing importance in the ATLAS experiment at CERN, one of the large scientific experiments at LHC. Support for such workflows has allowed users to exploit remote computing resources and service providers distributed worldwide, overcoming limitations on local resources and services. The spectrum of computing options keeps increasing across the Worldwide LHC Computing Grid (WLCG), volunteer computing, high-performance computing, commercial clouds, and emerging service levels like Platform-as-a-Service (PaaS), Container-as-a-Service (CaaS) and Function-as-a-Service (FaaS), each one providing new advantages and constraints. Users can significantly benefit from these providers, but at the same time, it is cumbersome to deal with multiple providers, even in a single analysis workflow with fine-grained requirements coming from their applications’ nature and characteristics. In this paper, we will first highlight issues in geographically-distributed heterogeneous computing, such as the insulation of users from the complexities of dealing with remote providers, smart workload routing, complex resource provisioning, seamless execution of advanced workflows, workflow description, pseudointeractive analysis, and integration of PaaS, CaaS, and FaaS providers. We will also outline solutions developed in ATLAS with the Production and Distributed Analysis (PanDA) system and future challenges for LHC Run4. | WG2 | |
CMS experience of running glideinWMS in High Availability mode | I. Sfiligoi, J. Letts, S. Belforte, A. McCrea, K. Larson, M. Zvada, B. Holzman, P. Mhashilkar, D. C. Bradley, M. D. Saiz Santos, F. Fanzago, O. Gutsche, T. Martin, F. Würthwein | Conference Paper |
| The CMS experiment at the Large Hadron Collider is relying on the HTCondor-based glideinWMS batch system to handle most of its distributed computing needs. In order to minimize the risk of disruptions due to software and hardware problems, and also to simplify the maintenance procedures, CMS has set up its glideinWMS instance to use most of the attainable High Availability (HA) features. The setup involves running services distributed over multiple nodes, which in turn are located in several physical locations, including Geneva (Switzerland), Chicago (Illinois, USA) and San Diego (California, USA). This paper describes the setup used by CMS, the HA limits of this setup, as well as a description of the actual operational experience spanning many months. | WG2 | |
CMS strategy for HPC resource exploitation | Antonio Pérez-Calero Yzquierdo | Conference paper |
| High Energy Physics (HEP) experiments will enter a new era with the start of the HL-LHC program, with computing needs surpassing by large factors the current capacities. Anticipating such scenario, funding agencies from participating countries are encouraging the experimental collaborations to consider the rapidly developing High Performance Computing (HPC) international infrastructures to satisfy at least a fraction of the foreseen HEP processing demands. These HPC systems are highly non-standard facilities, custom-built for use cases largely different from HEP demands, namely the processing of particle collisions (real or simulated) which can be analyzed individually without correlation. The access and utilization of these systems by HEP experiments will not be trivial, given the diversity of configuration and requirements for access among HPC centers, increasing the level of complexity from the HEP experiment integration and operations perspectives. Additionally, while HEP data is residing on a distributed highly-interconnected storage infrastructure, HPC systems are in general not meant for accessing large data volumes residing outside the facility. Finally, the allocation policies to these resources are generally different from the current usage of pledged resources deployed at supporting Grid sites. This report covers the CMS strategy developed to make effective use of HPC resources, involving a closer collaboration between CMS and HPC centers in order to further understand and subsequently overcome the present obstacles. Progress in the necessary technical and operational adaptations being made in CMS computing is described. | WG2 | |
Status of DiracGrid projects | Federico Stagni | Presentation |
| N/A | WG2 | |
Supercomputers, Clouds and Grids powered by BigPanDA for Brain studies | A Beche, K De, F Delalondre, F Schuermann, A Klimentov and R Mashinistov | Conference Paper |
| The PanDA WMS - Production and Distributed Analysis Workload Management System - has been developed and used by the ATLAS experiment at the LHC (Large Hadron Collider) for all data processing and analysis challenges. BigPanDA is an extension of the PanDA WMS to run ATLAS and non-ATLAS applications on Leadership Class Facilities and supercomputers, as well as traditional grid and cloud resources. The success of the BigPanDA project has drawn attention from other compute intensive sciences such as biology. In 2017, a pilot project was started between BigPanDA and the Blue Brain Project (BBP) of the Ecole Polytechnique Federal de Lausanne (EPFL) located in Lausanne, Switzerland. This proof of concept project is aimed at demonstrating the efficient application of the BigPanDA system to support the complex scientific workflow of the BBP, which relies on using a mix of desktop, cluster and supercomputers to reconstruct and simulate accurate models of brain tissue. | WG2 | |
SKA Regional Centres Network (SRCNet) Software Architecture Document | J. Salgado, M. van Haarlem, L. Ball et al. | Design Description Document |
| This document is a technical deliverable describing the SKA Regional Centres (SRCs) Network (SRCNet) software architecture. This description covers the use cases to be implemented on the SRCNet, the common modules needed for the SRC blueprint implementation, the interfaces between these modules and other SRCNet nodes and the constraints on the implementation. [...] | WG1 | |
SKAO Science Data Products: A Summary | Shari Breen, Rosie Bolton, Antonio Chrysostomou | Data Product Summary |
| This document provides a summary of the data products that SKA users can expect, as well as the processes through which they will be delivered. The information presented here is largely derived from the “Observatory Establishment and Delivery Plan” [AD1] but with some additional details or emphasis appropriate for the provision of this brief reference document for the SKA Regional Centre Steering Committee and Science Working Groups. | WG1 | |
SKA1 Design Baseline Description | P. Dewdney, et al. | Design Baseline Description |
| This document describes the overall System Design of the SKA Observatory and its telescopes. It is a design perspective rather than a management, funding or organisational one. It serves to integrate all the system and sub-system designs to provide a coherent, authoritative description of the SKA1 Design Baseline. It also shows how the observatory, telescopes, and operations-design function together as an interconnected whole. In addition to summarising the entire design, it references primary sources of information as recorded in references interspersed throughout, which are mainly detailed documents drawn from the sub-system design work. | WG1 | |
LOFAR2.0 compared to LOFAR: a short summary | LOFAR | Technical Summary |
| LOFAR2.0, see also the LOFAR2.0 White Paper, is a major upgrade to the LOw-Frequency ARray (LOFAR), ofering simultaneous low- and high-band observing, increased feld-of-view, and various other improvements to the sensitivity and operation of the telescope. A set of staged LOFAR2.0 test stations are helping to commission the new hardware and sofware, with a full system roll-out expected in 2024 − 2025, followed by early shared-risk observations and full operations thereafer. LOFAR2.0 will continue to be unique and world-leading, with an angular resolution > 10× higher than that of the planned Square Kilometre Array low-frequency component (SKA-Low), and also accessing the largely unexplored spectral window below 50 MHz. | WG1 | |
LOFAR2.0 Data Management Capabilities | Roberto Pizzo, John D. Swinbank, Irene Bonati | Note |
| This document, which accompanies the call for LOFAR2.0 Large Programme proposals, summarizes the services that will be allocated by the ILT Foundation — and, in future, LOFAR ERIC — to process, archive, and distribute LOFAR2.0 Data Products. These services, deriving from development effort, operational activities, and infrastructure capacity contributed by various partners, will be provided to end users under the management of the ASTRON Science Data Centre (SDC). It is a practical application, for the LOFAR2.0 Large Programme Call, of the LOFAR ERIC Data Policy, and is intended to act as a reference for those teams submitting proposals; it is expected that those proposals will refer directly to the services described here, and proposal teams are asked to indicate which of these capabilities they expect to build upon or otherwise use. | WG1 | |
LOFAR ERIC Access Policy to Scientific User Services | Interim LOFAR ERIC Council | Policy Document |
| The statutory principal task of LOFAR ERIC is: “to assure coordinated exploitation of the LOFAR infrastructure, to produce world-class scientific research and to pursue further development, with the aim to maximise productivity and impact for the Members and the international scientific community, positioning LOFAR ERIC as a world-leading research infrastructure with a long-term perspective”. The Access Policy contributes to the success of the principal task by regulating user access to the suite of scientific research services provided by LOFAR ERIC, given that the infrastructure operated by LOFAR ERIC has a finite capacity (for observing, data analysis, etc.), and that the LOFAR ERIC operations budget to carry out activities is finite also. | WG1 | |
Science Data Policy of LOFAR ERIC | LOFAR ERIC Council | Policy Document |
| This document details the principles of the science data policy of LOFAR ERIC, in line with Article 32 of the LOFAR ERIC Statutes. The policy outlines the ownership and access to scientific data acquired, managed, or created through research by or involving LOFAR ERIC. It applies to users contributing to LOFAR ERIC or using data supplied through LOFAR ERIC. The aim of this policy is to ensure open and easy access to the stored LOFAR ERIC science data and to conserve these data so as to maximise their overall long-term science yield within reasonable technical and budgetary means. | WG1 | |
The Ligo-Virgo-KAGRA Computing Infrastructure for Gravitational-wave Research | Stefano Bagnasco for the Virgo Collaboration and the LIGO Scientific Collaboration | Publication |
| The LIGO, VIRGO and KAGRA Gravitational-wave (GW) observatories are getting ready for their fourth observational period, O4, scheduled to begin in March 2023, with improved sensitivities and thus higher event rates. GW-related computing has both large commonalities with HEP computing, particularly in the domain of offline data processing and analysis, and important differences, for example in the fact that the amount of raw data doesn’t grow much with the instrument sensitivity, or the need to timely generate and distribute “event candidate alerts” to EM and neutrino observatories, thus making gravitational multi-messenger astronomy possible. Data from the interferometers are exchanged between collaborations both for low-latency and offline processing; in recent years, the three collaborations designed and built a common distributed computing infrastructure to prepare for a growing computing demand, and to reduce the maintenance burden of legacy custom-made tools, by increasingly adopting tools and architectures originally developed in the context of HEP computing. So, for example, HTCondor is used for workflow management, Rucio for many data management needs, CVMFS for code and data distribution, and more. We will present GW computing use cases and report about the architecture of the computing infrastructure as will be used during O4, as well as some planned upgrades for the subsequent observing run O5. | WG1 | |
Total cost of ownership and evaluation of Google cloud resources for the ATLAS experiment at the LHC | The ATLAS Collaboration | Publication |
| The ATLAS Google Project was established as part of an ongoing evaluation of the use of commercial clouds by the ATLAS Collaboration, in anticipation of the potential future adoption of such resources by WLCG grid sites to fulfil or complement their computing pledges. Seamless integration of Google cloud resources into the worldwide ATLAS distributed computing infrastructure was achieved at large scale and for an extended period of time, and hence cloud resources are shown to be an effective mechanism to provide additional, flexible computing capacity to ATLAS. For the first time a total cost of ownership analysis has been performed, to identify the dominant cost drivers and explore effective mechanisms for cost control. Network usage significantly impacts the costs of certain ATLAS workflows, underscoring the importance of implementing such mechanisms. Resource bursting has been successfully demonstrated, whilst exposing the true cost of this type of activity. A follow–up to the project is underway to investigate methods for improving the integration of cloud resources in data–intensive distributed computing environments and reducing costs related to network connectivity, which represents the primary expense when extensively utilising cloud resources. | WG1 | |
Preparatory Phase for the Einstein Telescope Gravitational Wave Observatory: Computing and Data Requirements | Paul Laycock, Stefano Bagnasco, Nadia Tonello, Loïc Rolland, Patrice Verdier, Andres Tanasijczuk | Deliverable |
| The purpose of this document is to define the Computing and Data requirements that a future computing model will need to fulfil to deliver the Einstein Telescope (ET) science program. The computing requirements to operate the ET detector can be reliably extrapolated based on the operational needs of existing gravitational wave (GW) detectors. However, the algorithms and simulations used to perform analysis of experimental data are an active topic of research. A first attempt at quantifying their computing needs by extrapolating from second generation GW detectors is presented as a baseline, with a discussion of potential future improvements. […] | WG1 | |
HPC Resources Integration at CMS | CMS Offline Software and Computing | Publication |
| This document identifies minimal set of requirements on HPC based resources in order to run CMS workflows: it is meant to sketch the strategy for an effective exploitation of such machines. It describes possible ways to use HPC machines for any type of workflows, including data reconstruction and Monte Carlo digitisation and reconstruction, besides Monte Carlo generation and simulation, which would represent little value overall if considered alone. This needs the identification of the requirements in order to try to use HPC as any other traditional owned site, and thus dealing with large data input/output, stressing both storage and network. | WG1 | |
Enabling HPC Systems for HEP: The INFN-CINECA Experience | Tommaso Boccali, Stefano Dal Pra, Stefano Zani, Lucia Morganti, Daniele Cesini, Vladimir Sapunenko, Daniele Spiga, Diego Ciangottini, Francesco Noferini, Concezio Bozzi, Stefano Perazzini, Andrea Valassi, Federico Stagni, Alessandro De Salvo, Alessandra Doria, Luca dell’Agnello, Gaetano Maron | Publication |
| In this report we want to describe a successful integration exercise between CINECA (PRACE Tier-0) Marconi KNL system and LHC processing. A production-level system has been deployed using a 30 Mhours grant from the 18th Call for PRACE Project Access; thanks to CINECA, more than 3x the granted hours were eventually made available. Modifications at multiple levels were needed: on experiments' WMS layers, on site level access policies and routing, on virtualization. The success of the integration process paves the way to integration with additional local systems, and in general shows how the requirements of a HPC center can coexist with the needs from data intensive, complex distributed workflows. | WG1 |