106 research outputs found
TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System
Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier
A utility-based model to define the optimal data quality level in IT service offerings
In the information age, enterprises base or enrich their core business activities with the provision of informative services. For this reason, organizations are becoming increasingly aware of data quality issues, which concern the evaluation of the ability of a data collection to meet usersā needs. Data quality is a multidimensional and subjective issue, since it is defined by a variety of criteria, whose definition and evaluation is strictly dependent on the context and users involved. Thus, when considering data quality, the usersā perspective should always be considered fundamental. Authors in data quality literature agree that providers should adapt, and consequently improve, their service offerings in order to completely satisfy usersā demands. However, we argue that, in service provisioning, providers are subject to restrictions stemming, for instance, from costs and benefits assessments. Therefore, we identify the need for a conciliation of providersā and usersā quality targets in defining the optimal data quality level of an informative service. The definition of such equilibrium is a complex issue since each type of user accessing the service may define different utilities regarding the provided information. Considering this scenario, the paper presents a utility-based model of the providersā and customersā interests developed on the basis of multi-class offerings. The model is exploited to analyze the optimal service offerings that allow the efficient allocation of quality improvements activities for the provider
Towards a Maturity Model of Process Mining as an Analytic Capability
Process mining applications offer a range of capabilities to analyze processes and improve organizational performance. Evaluating process mining capabilities is essential to demonstrate the business value created by process mining. Currently, there is a paucity of studies to evaluate the maturity of process mining analytic capability. This paper aims to close this gap. We created the first version of a maturity model of process mining as an analytical capability integrating the maturity models available for business process management, data analytics, and Artificial Intelligence (AI) organizational capabilities. Then, we evaluated the model with qualitative interviews with process mining experts. The interview feedback has been used to design an improved version of the proposed maturity model, which we aim to deploy in real-world case studies in the future
Online anomaly detection using statistical leverage for streaming business process events
While several techniques for detecting trace-level anomalies in event logs in
offline settings have appeared recently in the literature, such techniques are
currently lacking for online settings. Event log anomaly detection in online
settings can be crucial for discovering anomalies in process execution as soon
as they occur and, consequently, allowing to promptly take early corrective
actions. This paper describes a novel approach to event log anomaly detection
on event streams that uses statistical leverage. Leverage has been used
extensively in statistics to develop measures to identify outliers and it has
been adapted in this paper to the specific scenario of event stream data. The
proposed approach has been evaluated on both artificial and real event streams.Comment: 12 pages, 4 figures, conference (Proceedings of the 1st International
Workshop on Streaming Analytics for Process Mining (SA4PM 2020) in
conjunction with International Conference on Process Mining, Accepted for
publication (Sep 2020)
Stability Metrics for Enhancing the Evaluation of Outcome-Based Business Process Predictive Monitoring
Outcome-based predictive process monitoring deals with predicting the outcomes of running cases in a business process using feature vectors extracted from completed traces in an event log. Traditionally, in outcome-based predictive monitoring, a different model is developed using a bucket containing different types of feature vectors. This allows us to extend the traditional evaluation of the quality of process outcome predictions models beyond simply measuring the overall performance, developing a quality assessment framework based on three metrics: one considering the overall performance on all feature vectors, one considering the different levels of performance achieved on feature vectors belonging to individual buckets, i.e., the stability of the performance across buckets, and one considering the stability of the individual predictions obtained, accounting for how close the predicted probabilities are to the cutoff thresholds used to determine the predicted labels. The proposed metrics allow to evaluate, given a set of alternative designs, i.e., combinations of classifier and bucketing method, the quality of the predictions of each alternative. For this evaluation, we suggest using either the concept of Pareto-optimality or a scenario-based scoring method. We discuss an evaluation of the proposed framework conducted with real-life event logs
Measuring the Stability of Process Outcome Predictions in Online Settings
Predictive Process Monitoring aims to forecast the future progress of process
instances using historical event data. As predictive process monitoring is
increasingly applied in online settings to enable timely interventions,
evaluating the performance of the underlying models becomes crucial for
ensuring their consistency and reliability over time. This is especially
important in high risk business scenarios where incorrect predictions may have
severe consequences. However, predictive models are currently usually evaluated
using a single, aggregated value or a time-series visualization, which makes it
challenging to assess their performance and, specifically, their stability over
time. This paper proposes an evaluation framework for assessing the stability
of models for online predictive process monitoring. The framework introduces
four performance meta-measures: the frequency of significant performance drops,
the magnitude of such drops, the recovery rate, and the volatility of
performance. To validate this framework, we applied it to two artificial and
two real-world event logs. The results demonstrate that these meta-measures
facilitate the comparison and selection of predictive models for different
risk-taking scenarios. Such insights are of particular value to enhance
decision-making in dynamic business environments.Comment: 8 pages, 3 figures, Proceedings of the 5th International Conference
on Process Mining (ICPM 2023
Leveraging a Heterogeneous Ensemble Learning for Outcome-Based Predictive Monitoring Using Business Process Event Logs
Outcome-based predictive process monitoring concerns predicting the outcome of a running process case using historical events stored as so-called process event logs. This prediction problem has been approached using different learning models in the literature. Ensemble learners have been shown to be particularly effective in outcome-based business process predictive monitoring, even when compared with learners exploiting complex deep learning architectures. However, the ensemble learners that have been used in the literature rely on weak base learners, such as decision trees. In this article, an advanced stacking ensemble technique for outcome-based predictive monitoring is introduced. The proposed stacking ensemble employs strong learners as base classifiers, i.e., other ensembles. More specifically, we consider stacking of random forests, extreme gradient boosting machines, and gradient boosting machines to train a process outcome prediction model. We evaluate the proposed approach using publicly available event logs. The results show that the proposed model is a promising approach for the outcome-based prediction task. We extensively compare the performance differences among the proposed methods and the base strong learners, using also statistical tests to prove the generalizability of the results obtained
Exploring the Suitability of Rule-Based Classification to Provide Interpretability in Outcome-Based Process Predictive Monitoring
The development of models for process outcome prediction using event logs has evolved in the literature with a clear focus on performance improvement. In this paper, we take a different perspective, focusing on obtaining interpretable predictive models for outcome prediction. We propose to use association rule-based classification, which results in inherently interpretable classification models. Although association rule mining has been used with event logs for process model approximation and anomaly detection in the past, its application to an outcome-based predictive model is novel. Moreover, we propose two ways of visualising the rules obtained to increase the interpretability of the model. First, the rules composing a model can be visualised globally. Second, given a running case on which a prediction is made, the rules influencing the prediction for that particular case can be visualised locally. The experimental results on real world event logs show that in most cases the performance of the rule-based classifier (RIPPER) is close to the one of traditional machine learning approaches. We also show the application of the global and local visualisation methods to real world event logs
- ā¦