Search CORE

16 research outputs found

Executing Online Anomaly Detection in Complex Dynamic Systems

Author: Zoppi Tommaso
Publication venue: Department of Measurement and Information Systems, Budapest University of Technology and Economics
Publication date: 01/01/2017
Field of study

System failure prediction through rare-events elastic-net logistic regression

Author: Dueñas López Juan Carlos
Navarro González José Manuel
Parada Gélvez Hugo Alexer
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Predicting failures in a distributed system based on previous events through logistic regression is a standard approach in literature. This technique is not reliable, though, in two situations: in the prediction of rare events, which do not appear in enough proportion for the algorithm to capture, and in environments where there are too many variables, as logistic regression tends to overfit on this situations; while manually selecting a subset of variables to create the model is error- prone. On this paper, we solve an industrial research case that presented this situation with a combination of elastic net logistic regression, a method that allows us to automatically select useful variables, a process of cross-validation on top of it and the application of a rare events prediction technique to reduce computation time. This process provides two layers of cross- validation that automatically obtain the optimal model complexity and the optimal mode l parameters values, while ensuring even rare events will be correctly predicted with a low amount of training instances. We tested this method against real industrial data, obtaining a total of 60 out of 80 possible models with a 90% average model accuracy

Archivo Digital UPM

Classification in sparse, high dimensional environments applied to distributed systems failure prediction

Author: A.S. Tanenbaum
B. Schroeder
F. Salfner
G. King
H. Zou
M. Gallet
N. Trendafilov
W. Ahmed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Network failures are still one of the main causes of distributed systems’ lack of reliability. To overcome this problem we present an improvement over a failure prediction system, based on Elastic Net Logistic Regression and the application of rare events prediction techniques, able to work with sparse, high dimensional datasets. Specifically, we prove its stability, fine tune its hyperparameter and improve its industrial utility by showing that, with a slight change in dataset creation, it can also predict the location of a failure, a key asset when trying to take a proactive approach to failure management

Crossref

Archivo Digital UPM

States Prediction of Web Services Using Hidden Markov Model

Author: Fernando s
Prasanga RKM
Weerasuriya GT
Wijesiriwardana C
Publication venue
Publication date: 11/03/2017
Field of study

Over the last few decades, service oriented architectures, in particularly web services, have grown in popularity in the context of enterprise level application integration. As a result, most of the enterprise level software systems tended to be developed with a flavor of web service components. However, like all other distributed software technologies, web services also fail. Therefore, proper mechanisms and tools to handle system failures are vital to avoid such exceptional behaviors. To address that problem, this paper investigates a state prediction mechanism for web services using Hidden Markov Model (HMM). This approach is capable of predicting the future exceptional behaviors of the web service by analyzing and identifying the error patterns generated by long-running web services. This research can be further extended with an automated system input to determine the system state

Digital Repository, University of Moratuwa

A Novel System Anomaly Prediction System Based on Belief Markov Model and Ensemble Classification

Author: Shanping Li
Xiaozhen Zhou
Zhen Ye
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Computer systems are becoming extremely complex, while system anomalies dramatically influence the availability and usability of systems. Online anomaly prediction is an important approach to manage imminent anomalies, and the high accuracy relies on precise system monitoring data. However, precise monitoring data is not easily achievable because of widespread noise. In this paper, we present a method which integrates an improved Evidential Markov model and ensemble classification to predict anomaly for systems with noise. Traditional Markov models use explicit state boundaries to build the Markov chain and then make prediction of different measurement metrics. A Problem arises when data comes with noise because even slight oscillation around the true value will lead to very different predictions. Evidential Markov chain method is able to deal with noisy data but is not suitable in complex data stream scenario. The Belief Markov chain that we propose has extended Evidential Markov chain and can cope with noisy data stream. This study further applies ensemble classification to identify system anomaly based on the predicted metrics. Extensive experiments on anomaly data collected from 66 metrics in PlanetLab have confirmed that our approach can achieve high prediction accuracy and time efficiency

Crossref

Directory of Open Access Journals

Seer: a lightweight online failure prediction approach

Author: Ozcelik Burcu
Yılmaz Cemal
Yilmaz Cemal
Özçelik Burcu
Publication venue: IEEE (Institute of Electrical and Electronics Engineers)
Publication date: 31/05/2015
Field of study

Online failure prediction aims to predict the manifestation of failures at runtime before the failures actually occur. Existing online failure prediction approaches typically operate on data which is either directly reported by the system under test or directly observable from outside system executions. These approaches generally refrain themselves from collecting internal execution data that can further improve the prediction quality. One reason behind this general trend is due to the runtime overhead cost incurred by the measurement instruments that are required to collect the data. In this work we conjecture that large cost reductions in collecting internal execution data for online failure prediction can derive from reducing the cost of the measurement instruments, while still supporting acceptable levels of prediction quality. To evaluate this conjecture, we present a lightweight online failure prediction approach, called Seer. Seer uses fast hardware performance counters to perform most of the data collection work. The data is augmented with further data collected by a minimal amount of software instrumentation that is added to the systems software. We refer to the data collected in this manner as hybrid spectra. We applied the proposed approach to three widely used open source subject applications and evaluated it by comparing and contrasting three types of hybrid spectra and two types of traditional software spectra. At the lowest level of runtime overheads attained in the experiments, the hybrid spectra predicted the failures about half way through the executions with an F-measure of 0.77 and a runtime overhead of 1.98%, on average. Comparing hybrid spectra to software spectra, we observed that, for comparable runtime overhead levels, the hybrid spectra provided significantly better prediction accuracies and earlier warnings for failures than the software spectra. Alternatively, for comparable accuracy levels, the hybrid spectra incurred significantly less runtime overheads and provided earlier warnings

Crossref

Sabanci University Research Database

Data Driven Device Failure Prediction

Author: Jordan Paul L.
Publication venue: AFIT Scholar
Publication date: 15/09/2016
Field of study

As society becomes more dependent upon computer systems to perform increasingly critical tasks, ensuring those systems do not fail also becomes more important. Many organizations depend heavily on desktop computers for day to day operations. Unfortunately, the software that runs on these computers is still written by humans and as such, is still subject to human error and consequent failure. A natural solution is to use statistical machine learning to predict failure. However, since failure is still a relatively rare event, obtaining labeled training data to train these models is not trivial. This work presents new simulated fault loads with an automated framework to predict failure in the Microsoft enterprise authentication service and Apache web server in an effort to increase up-time and improve mission effectiveness. These new fault loads were successful in creating realistic failure conditions that are accurately identified by statistical learning models

AFTI Scholar (Air Force Institute of Technology)