1,484 research outputs found
Combining similarity in time and space for training set formation under concept drift
Concept drift is a challenge in supervised learning for sequential data. It describes a phenomenon when the data distributions change over time. In such a case accuracy of a classifier benefits from the selective sampling for training. We develop a method for training set selection, particularly relevant when the expected drift is gradual. Training set selection at each time step is based on the distance to the target instance. The distance function combines similarity in space and in time. The method determines an optimal training set size online at every time step using cross validation. It is a wrapper approach, it can be used plugging in different base classifiers. The proposed method shows the best accuracy in the peer group on the real and artificial drifting data. The method complexity is reasonable for the field applications
A Survey on Concept Drift Adaptation
Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art
A survey on feature drift adaptation: Definition, benchmark, challenges and future directions
Data stream mining is a fast growing research topic due to the ubiquity of data in several real-world problems. Given their ephemeral nature, data stream sources are expected to undergo changes in data distribution, a phenomenon called concept drift. This paper focuses on one specific type of drift that has not yet been thoroughly studied, namely feature drift. Feature drift occurs whenever a subset of features becomes, or ceases to be, relevant to the learning task; thus, learners must detect and adapt to these changes accordingly. We survey existing work on feature drift adaptation with both explicit and implicit approaches. Additionally, we benchmark several algorithms and a naive feature drift detection approach using synthetic and real-world datasets. The results from our experiments indicate the need for future research in this area as even naive approaches produced gains in accuracy while reducing resources usage. Finally, we state current research topics, challenges and future directions for feature drift adaptation
Learning from Ontology Streams with Semantic Concept Drift
Data stream learning has been largely studied for extracting knowledge
structures from continuous and rapid data records. In the semantic Web, data is
interpreted in ontologies and its ordered sequence is represented as an
ontology stream. Our work exploits the semantics of such streams to tackle the
problem of concept drift i.e., unexpected changes in data distribution, causing
most of models to be less accurate as time passes. To this end we revisited (i)
semantic inference in the context of supervised stream learning, and (ii)
models with semantic embeddings. The experiments show accurate prediction with
data from Dublin and Beijing
CONDA-PM -- A Systematic Review and Framework for Concept Drift Analysis in Process Mining
Business processes evolve over time to adapt to changing business
environments. This requires continuous monitoring of business processes to gain
insights into whether they conform to the intended design or deviate from it.
The situation when a business process changes while being analysed is denoted
as Concept Drift. Its analysis is concerned with studying how a business
process changes, in terms of detecting and localising changes and studying the
effects of the latter. Concept drift analysis is crucial to enable early
detection and management of changes, that is, whether to promote a change to
become part of an improved process, or to reject the change and make decisions
to mitigate its effects. Despite its importance, there exists no comprehensive
framework for analysing concept drift types, affected process perspectives, and
granularity levels of a business process. This article proposes the CONcept
Drift Analysis in Process Mining (CONDA-PM) framework describing phases and
requirements of a concept drift analysis approach. CONDA-PM was derived from a
Systematic Literature Review (SLR) of current approaches analysing concept
drift. We apply the CONDA-PM framework on current approaches to concept drift
analysis and evaluate their maturity. Applying CONDA-PM framework highlights
areas where research is needed to complement existing efforts.Comment: 45 pages, 11 tables, 13 figure
- …