16 research outputs found

    A Holistic Approach to Log Data Analysis in High-Performance Computing Systems: The Case of IBM Blue Gene/Q

    Get PDF
    The complexity and cost of managing high-performance computing infrastructures are on the rise. Automating management and repair through predictive models to minimize human interventions is an attempt to increase system availability and contain these costs. Building predictive models that are accurate enough to be useful in automatic management cannot be based on restricted log data from subsystems but requires a holistic approach to data analysis from disparate sources. Here we provide a detailed multi-scale characterization study based on four datasets reporting power consumption, temperature, workload, and hardware/software events for an IBM Blue Gene/Q installation. We show that the system runs a rich parallel workload, with low correlation among its components in terms of temperature and power, but higher correlation in terms of events. As expected, power and temperature correlate strongly, while events display negative correlations with load and power. Power and workload show moderate correlations, and only at the scale of components. The aim of the study is a systematic, integrated characterization of the computing infrastructure and discovery of correlation sources and levels to serve as basis for future predictive modeling efforts.Comment: 12 pages, 7 Figure

    Complex decision making as a source of infotainment

    Get PDF
    Abstract In many policy processes nowadays a variety of actors is involved which results in complex decision making processes, since these different actors have various perspectives on the problem and the matching solutions. Such complex processes are difficult to grasp in short reports in newspapers or on television, especially since journalists have to deal with increasing time pressures and demands to make news items more entertaining. This leads to biases in the construction of the policy processes. In this study we examine whether the biases of fragmentization, dramatization, personalization, the authority-disorder bias and the negativity bias can be found in media reporting on complex decision making processes in the Netherlands. We conducted a quantitative content analysis on media reports on five complex water management projects in the Netherlands. We found that in these media reports stories are often fragmentized, dramatized and unfavourably towards the project, and frequently an authority is blamed for not taking appropriates measures. Certain actors take advantage of these biases more than other actors: media attention for oppositional politicians and interest groups in particular relate significantly to the media biases

    Classification in sparse, high dimensional environments applied to distributed systems failure prediction

    Get PDF
    Network failures are still one of the main causes of distributed systems’ lack of reliability. To overcome this problem we present an improvement over a failure prediction system, based on Elastic Net Logistic Regression and the application of rare events prediction techniques, able to work with sparse, high dimensional datasets. Specifically, we prove its stability, fine tune its hyperparameter and improve its industrial utility by showing that, with a slight change in dataset creation, it can also predict the location of a failure, a key asset when trying to take a proactive approach to failure management

    Unveiling clusters of events for alert and incident management in large-scale enterprise it

    No full text

    Explainable Deep Learning for Fault Prognostics in Complex Systems: A Particle Accelerator Use-Case

    No full text
    Sophisticated infrastructures often exhibit misbehaviour and failures resulting from complex interactions of their constituent subsystems. Such infrastructures use alarms, event and fault information, which is recorded to help diagnose and repair failure conditions by operations experts. This data can be analysed using explainable artificial intelligence to attempt to reveal precursors and eventual root causes. The proposed method is first applied to synthetic data in order to prove functionality. With synthetic data the framework makes extremely precise predictions and root causes can be identified correctly. Subsequently, the method is applied to real data from a complex particle accelerator system. In the real data setting, deep learning models produce accurate predictive models from less than ten error examples when precursors are captured. The approach described herein is a potentially valuable tool for operations experts to identify precursors in complex infrastructures
    corecore