License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/DagRep.13.4.98
URN: urn:nbn:de:0030-drops-192403
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/19240/
Go back to Dagstuhl Reports


Brandt, Jim ; Ciorba, Florina ; Gentile, Ann ; Ott, Michael ; Wilde, Torsten
Weitere Beteiligte (Hrsg. etc.): Jim Brandt and Florina Ciorba and Ann Gentile and Michael Ott and Torsten Wilde

Driving HPC Operations With Holistic Monitoring and Operational Data Analytics (Dagstuhl Seminar 23171)

pdf-format:
dagrep_v013_i004_p098_23171.pdf (2 MB)


Abstract

Advances in analytic approaches have brought the vision of efficient High Performance Computing (HPC) operations enabled by dynamic analysis driving automated feedback and adaptation within reach. Many HPC centers have started the development and deployment of frameworks to enable continuous and holistic monitoring, archiving, and analysis of performance data from their production machines and related infrastructures. The impact of such frameworks rests upon the ability to effectively analyze such data and to take action based on analysis results. Analytic techniques have been successfully developed and applied in other domains but their features may not apply directly to HPC operations data and situations. Response options are limited in HPC implementations. Leveraging, adapting, and extending analysis techniques and response options would open up new avenues for research and development of actionable analytics that can drive more intelligent operations through both manual and automated response to conditions of interest.
This Dagstuhl Seminar 23171 brought together practitioners and researchers in the areas of HPC system management and monitoring, analytics, and computer science to collaboratively work on developing community solutions for revolutionizing HPC system operations. The topics discussed in this seminar spanned use cases, data and analytic approaches required to address the use cases, use of analysis results to improve performance and operations, and research in the development and use of autonomous feedback loops.

BibTeX - Entry

@Article{brandt_et_al:DagRep.13.4.98,
  author =	{Brandt, Jim and Ciorba, Florina and Gentile, Ann and Ott, Michael and Wilde, Torsten},
  title =	{{Driving HPC Operations With Holistic Monitoring and Operational Data Analytics (Dagstuhl Seminar 23171)}},
  pages =	{98--120},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2023},
  volume =	{13},
  number =	{4},
  editor =	{Brandt, Jim and Ciorba, Florina and Gentile, Ann and Ott, Michael and Wilde, Torsten},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/19240},
  URN =		{urn:nbn:de:0030-drops-192403},
  doi =		{10.4230/DagRep.13.4.98},
  annote =	{Keywords: Monitoring, Operational Data Analytics, Dagstuhl Seminar, WAFVR}
}

Keywords: Monitoring, Operational Data Analytics, Dagstuhl Seminar, WAFVR
Collection: DagRep, Volume 13, Issue 4
Issue Date: 2023
Date of publication: 02.11.2023


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI