Search CORE

110,673 research outputs found

System log pre-processing to improve failure prediction

Author: Al Geist
Byung H. Park
Zhiling Lan
Ziming Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Log preprocessing, a process applied on the raw log be-fore applying a predictive method, is of paramount impor-tance to failure prediction and diagnosis. While existing fil-tering methods have demonstrated good compression rate, they fail to preserve important failure patterns that are cru-cial for failure analysis. To address the problem, in this paper we present a log preprocessing method. It consists of three integrated steps: (1) event categorization to uni-formly classify system events and identify fatal events; (2) event filtering to remove temporal and spatial redundant records, while also preserving necessary failure patterns for failure analysis; (3) causality-related filtering to com-bine correlated events for filtering through apriori associ-ation rule mining. We demonstrate the effectiveness of our preprocessing method by using real failure logs collected from the Cray XT4 at ORNL and the Blue Gene/L system at SDSC. Experiments show that our method can preserve more failure patterns for failure analysis, thereby improv-ing failure prediction by up to 174%

CiteSeerX

Crossref

Predicting Scheduling Failures in the Cloud

Author: Khomh Foutse
Soualhia Mbarka
Tahar Sofiene
Publication venue
Publication date: 01/01/2015
Field of study

Cloud Computing has emerged as a key technology to deliver and manage computing, platform, and software services over the Internet. Task scheduling algorithms play an important role in the efficiency of cloud computing services as they aim to reduce the turnaround time of tasks and improve resource utilization. Several task scheduling algorithms have been proposed in the literature for cloud computing systems, the majority relying on the computational complexity of tasks and the distribution of resources. However, several tasks scheduled following these algorithms still fail because of unforeseen changes in the cloud environments. In this paper, using tasks execution and resource utilization data extracted from the execution traces of real world applications at Google, we explore the possibility of predicting the scheduling outcome of a task using statistical models. If we can successfully predict tasks failures, we may be able to reduce the execution time of jobs by rescheduling failed tasks earlier (i.e., before their actual failing time). Our results show that statistical models can predict task failures with a precision up to 97.4%, and a recall up to 96.2%. We simulate the potential benefits of such predictions using the tool kit GloudSim and found that they can improve the number of finished tasks by up to 40%. We also perform a case study using the Hadoop framework of Amazon Elastic MapReduce (EMR) and the jobs of a gene expression correlations analysis study from breast cancer research. We find that when extending the scheduler of Hadoop with our predictive models, the percentage of failed jobs can be reduced by up to 45%, with an overhead of less than 5 minutes

arXiv.org e-Print Archive

PolyPublie

Clustering-Based Predictive Process Monitoring

Author: Di Francescomarino Chiara
Dumas Marlon
Maggi Fabrizio Maria
Teinemaa Irene
Publication venue
Publication date: 03/06/2015
Field of study

Business process enactment is generally supported by information systems that record data about process executions, which can be extracted as event logs. Predictive process monitoring is concerned with exploiting such event logs to predict how running (uncompleted) cases will unfold up to their completion. In this paper, we propose a predictive process monitoring framework for estimating the probability that a given predicate will be fulfilled upon completion of a running case. The predicate can be, for example, a temporal logic constraint or a time constraint, or any predicate that can be evaluated over a completed trace. The framework takes into account both the sequence of events observed in the current trace, as well as data attributes associated to these events. The prediction problem is approached in two phases. First, prefixes of previous traces are clustered according to control flow information. Secondly, a classifier is built for each cluster using event data to discriminate between fulfillments and violations. At runtime, a prediction is made on a running case by mapping it to a cluster and applying the corresponding classifier. The framework has been implemented in the ProM toolset and validated on a log pertaining to the treatment of cancer patients in a large hospital

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler