16 research outputs found
A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q
The complexity and cost of managing high-performance computing
infrastructures are on the rise. Automating management and repair through
predictive models to minimize human interventions is an attempt to increase
system availability and contain these costs. Building predictive models that
are accurate enough to be useful in automatic management cannot be based on
restricted log data from subsystems but requires a holistic approach to data
analysis from disparate sources. Here we provide a detailed multi-scale
characterization study based on four datasets reporting power consumption,
temperature, workload, and hardware/software events for an IBM Blue Gene/Q
installation. We show that the system runs a rich parallel workload, with low
correlation among its components in terms of temperature and power, but higher
correlation in terms of events. As expected, power and temperature correlate
strongly, while events display negative correlations with load and power. Power
and workload show moderate correlations, and only at the scale of components.
The aim of the study is a systematic, integrated characterization of the
computing infrastructure and discovery of correlation sources and levels to
serve as basis for future predictive modeling efforts.Comment: 12 pages, 7 Figure
Complex decision making as a source of infotainment
Abstract
In many policy processes nowadays a variety of actors is involved which results in
complex decision making processes, since these different actors have various
perspectives on the problem and the matching solutions. Such complex processes are
difficult to grasp in short reports in newspapers or on television, especially since
journalists have to deal with increasing time pressures and demands to make news
items more entertaining. This leads to biases in the construction of the policy processes.
In this study we examine whether the biases of fragmentization, dramatization,
personalization, the authority-disorder bias and the negativity bias can be found in
media reporting on complex decision making processes in the Netherlands.
We conducted a quantitative content analysis on media reports on five complex water
management projects in the Netherlands. We found that in these media reports stories
are often fragmentized, dramatized and unfavourably towards the project, and
frequently an authority is blamed for not taking appropriates measures. Certain actors
take advantage of these biases more than other actors: media attention for oppositional
politicians and interest groups in particular relate significantly to the media biases
Classification in sparse, high dimensional environments applied to distributed systems failure prediction
Network failures are still one of the main causes of distributed systems’ lack of reliability. To overcome this problem we present an improvement over a failure prediction system, based on Elastic Net Logistic Regression and the application of rare events prediction techniques, able to work with sparse, high dimensional datasets. Specifically, we prove its stability, fine tune its hyperparameter and improve its industrial utility by showing that, with a slight change in dataset creation, it can also predict the location of a failure, a key asset when trying to take a proactive approach to failure management
Explainable Deep Learning for Fault Prognostics in Complex Systems: A Particle Accelerator Use-Case
Sophisticated infrastructures often exhibit misbehaviour and failures resulting from complex interactions of their constituent subsystems. Such infrastructures use alarms, event and fault information, which is recorded to help diagnose and repair failure conditions by operations experts. This data can be analysed using explainable artificial intelligence to attempt to reveal precursors and eventual root causes. The proposed method is first applied to synthetic data in order to prove functionality. With synthetic data the framework makes extremely precise predictions and root causes can be identified correctly. Subsequently, the method is applied to real data from a complex particle accelerator system. In the real data setting, deep learning models produce accurate predictive models from less than ten error examples when precursors are captured. The approach described herein is a potentially valuable tool for operations experts to identify precursors in complex infrastructures