37,109 research outputs found
Incremental Predictive Process Monitoring: How to Deal with the Variability of Real Environments
A characteristic of existing predictive process monitoring techniques is to
first construct a predictive model based on past process executions, and then
use it to predict the future of new ongoing cases, without the possibility of
updating it with new cases when they complete their execution. This can make
predictive process monitoring too rigid to deal with the variability of
processes working in real environments that continuously evolve and/or exhibit
new variant behaviors over time. As a solution to this problem, we propose the
use of algorithms that allow the incremental construction of the predictive
model. These incremental learning algorithms update the model whenever new
cases become available so that the predictive model evolves over time to fit
the current circumstances. The algorithms have been implemented using different
case encoding strategies and evaluated on a number of real and synthetic
datasets. The results provide a first evidence of the potential of incremental
learning strategies for predicting process monitoring in real environments, and
of the impact of different case encoding strategies in this setting
Dynamic feature selection for clustering high dimensional data streams
open access articleChange in a data stream can occur at the concept level and at the feature level. Change at the feature level can occur if new, additional features appear in the stream or if the importance and relevance of a feature changes as the stream progresses. This type of change has not received as much attention as concept-level change. Furthermore, a lot of the methods proposed for clustering streams (density-based, graph-based, and grid-based) rely on some form of distance as a similarity metric and this is problematic in high-dimensional data where the curse of dimensionality renders distance measurements and any concept of “density” difficult. To address these two challenges we propose combining them and framing the problem as a feature selection problem, specifically a dynamic feature selection problem. We propose a dynamic feature mask for clustering high dimensional data streams. Redundant features are masked and clustering is performed along unmasked, relevant features. If a feature's perceived importance changes, the mask is updated accordingly; previously unimportant features are unmasked and features which lose relevance become masked. The proposed method is algorithm-independent and can be used with any of the existing density-based clustering algorithms which typically do not have a mechanism for dealing with feature drift and struggle with high-dimensional data. We evaluate the proposed method on four density-based clustering algorithms across four high-dimensional streams; two text streams and two image streams. In each case, the proposed dynamic feature mask improves clustering performance and reduces the processing time required by the underlying algorithm. Furthermore, change at the feature level can be observed and tracked
SOTXTSTREAM: Density-based self-organizing clustering of text streams
A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets
The GC3 framework : grid density based clustering for classification of streaming data with concept drift.
Data mining is the process of discovering patterns in large sets of data. In recent years there has been a paradigm shift in how the data is viewed. Instead of considering the data as static and available in databases, data is now regarded as a stream as it continuously flows into the system. One of the challenges posed by the stream is its dynamic nature, which leads to a phenomenon known as Concept Drift. This causes a need for stream mining algorithms which are adaptive incremental learners capable of evolving and adjusting to the changes in the stream. Several models have been developed to deal with Concept Drift. These systems are discussed in this thesis and a new system, the GC3 framework is proposed. The GC3 framework leverages the advantages of the Gris Density based Clustering and the Ensemble based classifiers for streaming data, to be able to detect the cause of the drift and deal with it accordingly. In order to demonstrate the functionality and performance of the framework a synthetic data stream called the TJSS stream is developed, which embodies a variety of drift scenarios, and the model’s behavior is analyzed over time. Experimental evaluation with the synthetic stream and two real world datasets demonstrated high prediction capability of the proposed system with a small ensemble size and labeling ratio. Comparison of the methodology with a traditional static model with no drifts detection capability and with existing ensemble techniques for stream classification, showed promising results. Also, the analysis of data structures maintained by the framework provided interpretability into the dynamics of the drift over time. The experimentation analysis of the GC3 framework shows it to be promising for use in dynamic drifting environments where concepts can be incrementally learned in the presence of only partially labeled data
Dynamic instability transitions in 1D driven diffusive flow with nonlocal hopping
One-dimensional directed driven stochastic flow with competing nonlocal and
local hopping events has an instability threshold from a populated phase into
an empty-road (ER) phase. We implement this in the context of the asymmetric
exclusion process. The nonlocal skids promote strong clustering in the
stationary populated phase. Such clusters drive the dynamic phase transition
and determine its scaling properties. We numerically establish that the
instability transition into the ER phase is second order in the regime where
the entry point reservoir controls the current and first order in the regime
where the bulk is in control. The first order transition originates from a
turn-about of the cluster drift velocity. At the critical line, the current
remains analytic, the road density vanishes linearly, and fluctuations scale as
uncorrelated noise. A self-consistent cluster dynamics analysis explains why
these scaling properties remain that simple.Comment: 11 pages, 14 figures (25 eps files); revised as the publised versio
- …