Search CORE

47 research outputs found

Towards Continually Learning Application Performance Models

Author: Daram Anurag
Gunawi Haryadi S.
Madireddy Sandeep
Ross Robert B.
Sinurat Ray A. O.
Publication venue
Publication date: 25/10/2023
Field of study

Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions. Traditionally, these models assume that data distribution does not change as more samples are collected over time. However, owing to the complexity and heterogeneity of production HPC systems, they are susceptible to hardware degradation, replacement, and/or software patches, which can lead to drift in the data distribution that can adversely affect the performance models. To this end, we develop continually learning performance models that account for the distribution drift, alleviate catastrophic forgetting, and improve generalizability. Our best model was able to retain accuracy, regardless of having to learn the new distribution of data inflicted by system changes, while demonstrating a 2x improvement in the prediction accuracy of the whole data sequence in comparison to the naive approach.Comment: Presented at Workshop on Machine Learning for Systems at 36th Conference on Neural Information Processing Systems (NeurIPS 2022

arXiv.org e-Print Archive

Verifying File System Properties with Type Inference

Author: Gunawi Haryadi S.
Krishnan Shweta
Publication venue: University of Wisconsin-Madison Department of Computer Sciences
Publication date: 01/01/2011
Field of study

The storage stack is not trustworthy due to errors that arise from a variety of sources: unreliable hardware, malicious errors and file system bugs. Today, software errors play a dominant role due to their inherent complexity. In the first part of our project, we look towards verifying a specific file system property: on-disk pointer manipulation. We utilize CQUAL, a framework for adding type qualifiers with type inference support, and apply our analysis to the Linux ext2 file system. We find that adding qualifiers serves the valuable purpose of ensuring that on-disk pointers are accessed and manipulated correctly by the file system. Thus, we believe that the qualifiers we introduce would decrease the probability of bugs being introduced by file system programmers. We also describe our experience in using CQUAL and discuss its limitations. Based on our experience with CQUAL, we come up with a second analysis, a buffer management verifier, that fits better with the power of CQUAL by being simpler, yet more widely applicable to different file systems than the first analysis

Minds@University of Wisconsin

Impact of Limpware on HDFS: A Probabilistic Estimation

Author: Haryadi S. Gunawi
Thanh Do
Publication venue
Publication date: 13/11/2013
Field of study

With the advent of cloud computing, thousands of machines are connected and managed collectively. This era is confronted with a new challenge: performance variability, primarily caused by large-scale management issues such as hardware failures, software bugs, and configuration mistakes. In our previous work [2] we highlighted one overlooked cause: limping hardware – hardware whose performance degrades significantly compared to its specification. We showed that limping hardware can cause many limping scenarios in current scale-out systems. In this report, we quantify how often these scenarios happen in the Hadoop Distributed File System.

arXiv.org e-Print Archive

CiteSeerX