PERMAVOST 2024

Workshop Abstract

Modern software engineering is getting increasingly complicated. Especially in the HPC field, we are dealing with cutting edge infrastructure and a novel problem with unprecedented scale. The ability to monitor and analyze the performance of such applications and infrastructure is imperative for the future of improvement, design, and maintenance. In the current era, the writing and maintenance of these applications have ceased to be the job solely of computer scientists and have grown to encompass a wide variety of experts in mathematics, science, and other engineering disciplines. The fact that many developers from these disciplines have not received a formal education in computer science and rely increasingly on the tools created by computer scientists to analyze and optimize their code shows that there's a need for a forum to work together.

Workshop Overview

The Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy (PERMAVOST) goal is to bridge tools developers and end users of performance analysis tools. It is a half/full day workshop with a keynote in conjunction with HPDC 2024. We are hoping that the stakeholders, which are application developers, domain scientists, analyst, and tools developers can collaborate and build a bridge to fill in the gaps in various topics such as:

Key metrics, patterns, and performance pitfalls including the strategies to recognize and use the performance information to improve the applications.
Facing challenges that are coming from the new computing architecture, programming paradigm, novel scientific problems, and various scales of data that need to be processed effectively.
Research and needs to use modern principles of usability design integrated inside performance analysis tools to better aid its users.
Analysis and methodology that can be utilized and understood for users with various HPC knowledge spectrum.

Topics of Interest

Our workshop encompasses the following topics of interest, but are not limited to:

Performance analysis and modelling on the real world applications
Data visualization in high level performance analysis
Usability studies of HPC tools
Inefficiencies in programming patterns or computing architecture
Patterns, anomaly detection, and performance characterization in HPC applications
Performance engineering strategies and use cases
Human-Computer Interfaces for exploring performance data
Energy management in performance analysis and engineering
Performance analysis in Emerging HPC topics: Artificial Intelligence, Machine Learning, Quantum Computing, Container, and Cloud

Call for Paper

All submitted papers should be formatted using the ACM Master Template with sigconf format (please be sure to use the current version). The necessary document can be found here.

General Instructions

Full 5-8 page papers (including all text, figures and references)
Submissions must be in English and PDF format
Only web-based submissions are allowed. Paper needs to be submitted via PERMAVOST 2024 HotCRP link: https://permavost2024.hotcrp.com
We use single-blind reviewing process so you can keep authors' names, publications, etc.
Each paper will get atleast three reviews from the committee members
The submitted papers must be original work that have not previously been published or under consideration for publication in any other conference or journal
Accepted papers will be published in the workshop proceeding as part of the ACM Digital Library

Program

Schedule:

08:30 - 08:35 Welcome/Opening
08:35 - 09:35 Keynote by Dhabaleswar K. (DK) Panda
09:35 - 10:30 Invited Talk by Mathis Bode
10:30 - 11:00 Coffee break
11:10 - 11:40 Paper 1: "Cost-Efficient Construction of Performance Models" - L. Schmid, T. Sağlam, M. Selzer, A. Koziolek
11:40 - 12:10 Paper 2: "Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe" - C. Rae, J. Lee, J. Richings, M. Weiland
12:10 - 12:40 Paper 3: "Using Benchmarking and Regression Models for Predicting CNN Training Time on a GPU" - P. Bryzgalov, T. Maeda
12:40 - 13:00 Ending Remarks

Keynote Speech:

Dhabaleswar K. (DK) Panda

Ohio State University

"Designing High-Performance, Scalable, and Converged Middleware for HPC, AI, Big Data, and Data Sciences"

This talk will focus on challenges and opportunities in designing converged middleware for HPC, AI (Deep/Machine Learning), Big Data, and Data Science. We will start with the challenges in designing runtime environments for MPI+X programming models by considering support for multi-core systems, high-performance networks (InfiniBand, RoCE, Slingshot), GPUs (NVIDIA, AMD, and Intel), and emerging BlueField-3 DPUs. Features and sample performance numbers of using the MVAPICH2 libraries over a range of benchmarks will be presented. Performance engineering of HPC applications with TAU and INAM profiling software will be highlighted. For the Deep/Machine Learning domain, we will focus on MPI-driven solutions (MPI4DL) and Mix-and-match Communication Runtime (MCR-DL) to extract performance and scalability for popular Deep Learning frameworks (TensorFlow and PyTorch), large out-of-core models, Bluefield-3 DPUs, and parallel inferencing. Performance engineering of Deep Learning Training applications with a novel Deep Introspection framework will be highlighted. Finally, we will focus on MPI-driven solutions to accelerate Big Data applications (MPI4Spark) and data science applications (MPI4Dask) with appropriate benchmark results will be presented.

Prof. DK Panda from Ohio State University. DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He is serving as the Director of the ICICLE NSF-AI Institute (https://icicle.ai). He has published over 500 papers. The MVAPICH2 MPI libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,300 organizations worldwide (in 90 countries). More than 1.77 million downloads of this software have taken place from the project's site. This software empowers many clusters on the TOP500 list. High-performance and scalable solutions for Deep Learning frameworks and Machine Learning applications from his group are available at https://hidl.cse.ohio-state.edu. Similarly, scalable and high-performance solutions for Big Data and Data science frameworks are available from https://hibd.cse.ohio-state.edu. Prof. Panda is an IEEE Fellow and recipient of the 2022 IEEE Charles Babbage Award. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.

Invited Speech:

Mathis Bode

Jülich Supercomputing Centre

"JUPITER Research and Early Access Program (JUREAP) – Seeding Exascale in Europe!"

The JUPITER Research and Early Access Program (JUREAP) will support early applications on JUPITER to utilize the machine most efficiently. We are building up and extending on experiences with previous systems at JSC to implement a holistic program. JUREAP will target collaboration with about 20 applications on JUPITER that are balanced in terms of methods (including AI) and scientific domains. JSC experts and application scientists will support optimization of use-cases to enable highly efficient execution and push the boundaries of high-performance computing (HPC) beyond the state of the art. JUREAP will not only enable full-scale runs, but also support their execution, analysis, and joint publications. This talk summarizes the current developments and plans in the context of JUREAP. It will also provide a general overview of JUPITER.

Mathis is a researcher at the Jülich Supercomputing Centre (JSC) at Forschungszentrum Jülich (FZJ) and head of JUREAP. He has a background in large-scale multi-physics simulations and artificial intelligence, particularly with application to turbulence, combustion and multiphase flows, and general HPC research. In particular, he is currently working on exascale simulations and workflows. His research has won several awards, and he has been PI in various European projects.

Paper Presentation:

Cost-Efficient Construction of Performance Models - L. Schmid, T. Sağlam, M. Selzer, A. Koziolek
Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe - C. Rae, J. Lee, J. Richings, M. Weiland
Using Benchmarking and Regression Models for Predicting CNN Training Time on a GPU - P. Bryzgalov, T. Maeda

*The paper presentation will be followed by Q&A session

Workshop Chair

Radita Liem - RWTH Aachen University
Ayesha Afzal - NHR@FAU Erlangen-Nürnberg
Pouya Kousha - The Ohio State University
Zhaobin Zhu - Goethe University Frankfurt
Joseph Lee - EPCC

Program Committee

Chen Wang - Lawrence Livermore National Lab
Chih-Kai Huang - INRIA Rennes
Christian Terboven - RWTH Aachen University
Connor Scully-Allison - University of Utah
DK Panda - The Ohio State University
Eleanor Broadway - EPCC
Georg Hager - NHR@FAU Erlangen-Nürnberg
Hariharan Devarajan - Lawrence Livermore National Lab
Hari Subramoni - The Ohio State University
Jay Lofstead - Sandia National Lab
Lenny Guo - Pacific Northwest National Lab
Manpreet Singh Jattana - Goethe University Frankfurt
Nathan Talllent - Pacific Northwest National Lab
Sarah Neuwirth - Johannes Gutenberg-Universität Mainz
Thomas Ilsche - ZIH TU Dresden
Tyler Allen - UNC Charlotte
Wolfgang Frings - Jülich Supercomputing Centre

Contact

If you have any problems or questions, please contact us via e-mail at: liem@itc.rwth-aachen.de and ayesha.afzal@fau.de

Past Workshops

2023 2022 2021