Search CORE

6 research outputs found

Recommended from our members

Blending aggregation and selection: Adapting parallel coordinates for the visualization of large datasets

Author: Andrienko G.
Andrienko N.
Publication venue: 'Maney Publishing'
Publication date: 01/01/2005
Field of study

Many of the traditional data visualization techniques, which proved to be supportive for exploratory analysis of datasets of moderate sizes, fail to fulfil their function when applied to large datasets. There are two approaches to coping with large amounts of data: data selection, when only a portion of data is displayed, and data aggregation, i.e. grouping data items and considering the groups instead of the original data. None of these approaches alone suits the needs of exploratory data analysis, which requires consideration of data on all levels: overall (considering a dataset as a whole), intermediate (viewing and comparing collective characteristics of arbitrary data subsets, or classes), and elementary (accessing individual data items). Therefore, it is necessary to combine these approaches, i.e. build a tool showing the whole set and arbitrarily defined subsets (object classes) in an aggregated way and superimposing this with a representation of arbitrarily selected individual data items. We have achieved such a combination of approaches by modifying the technique of parallel coordinate plot. These modifications are described and analysed in the paper

City Research Online

Learning Concept Drift Using Adaptive Training Set Formation Strategy

Author: Kohail Sarah Nabeel Jameel
Publication venue: The Islamic University College Journal
Publication date: 01/01/2011
Field of study

We live in a dynamic world, where changes are a part of everyday ‘s life. When there is a shift in data, the classification or prediction models need to be adaptive to the changes. In data mining the phenomenon of change in data distribution over time is known as concept drift. In this research, we propose an adaptive supervised learning with delayed labeling methodology. As a part of this methodology, we introduce an adaptive training set formation algorithm called SFDL, which is based on selective training set formation. Our proposed solution considered as the first systematic training set formation approach that take into account delayed labeling problem. It can be used with any base classifier without the need to change the implementation or setting of this classifier. We test our algorithm implementation using synthetic and real dataset from various domains which might have different drift types (sudden, gradual, incremental recurrences) with different speed of change. The experimental results confirm improvement in classification accuracy as compared to ordinary classifier for all drift types. Our approach is able to increase the classifications accuracy with 20% in average and 56% in the best cases of our experimentations and it has not been worse than the ordinary classifiers in any case. Finally a comparison study with other four related methods to deal with changing in user interest over time and handle recurrence drift is performed. Results indicate the effectiveness of the proposed method over other methods in terms of classification accuracy

Institutional Repository of the Islamic University of Gaza

One or Two Things We know about Concept Drift -- A Survey on Monitoring Evolving Environments

Author: Hammer Barbara
Hinder Fabian
Vaquet Valerie
Publication venue
Publication date: 24/10/2023
Field of study

The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on drift detection. Besides, it covers the current state of research on drift localization in a systematic way. In addition to providing a systematic literature review, this work provides precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different strategies for detection and localization. Thereby, the suitability of different schemes can be analyzed systematically and guidelines for their usage in real-world scenarios can be provided. Finally, there is a section on the emerging topic of explaining concept drift

arXiv.org e-Print Archive

Visualizing concept drift

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref