Search CORE

15 research outputs found

Sensitivity-Based Optimization of Unsupervised Drift Detection for Categorical Data Streams

Author: Bender Janek
Ovtcharova Jivka
Trat Martin
Publication venue: Karlsruher Institut für Technologie
Publication date: 26/01/2023
Field of study

Real-world data streams are rarely characterized by stationary data distributions. Instead, the phenomenon commonly termed as concept drift, threatens the performance of estimators conducting inference on such data. Our contribution builds on the unsupervised concept drift detector CDCStream, which is specialized on processing categorical data directly. We propose a cooldown mechanism aiming at reducing its excessive sensitivity in order to curb false-alarm detections. Using practical classification and regression problems, we evaluate the impact of the mechanism on estimation performance and highlight the transferability of our mechanism on other detection methods. Additionally, we provide an intuitive means for tuning the sensitivity of drift detectors. While only marginally improving the unaltered form of the detector on publicly available benchmark data, our mechanism does so consistently in almost all configurations. In contrast, within the context of another real-world scenario, almost none of the tested drift-detection-based approaches could outperform a baseline approach. However, potentially false-alarm detections are reduced drastically in all scenarios. With this resulting in a cutback in signals for refitting estimators, while maintaining a better or at least comparable performance to vanilla CDCStream, compute infrastructure utilization could be economized further

KITopen

Incremental Market Behavior Classification in Presence of Recurring Concepts

Author: Cervantes Alejandro
Quintana David
Suárez Cetrulo Andrés L.
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or slow to adapt to these changes. Ensemble-based systems are widely known for their good results predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF (Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest for evolving data streams, adding on top a mechanism to store and handle a shared collection of inactive trees, called concept history, which holds memories of the way market operators reacted in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by replacing active trees with the best available alternative: either a previously stored tree from the concept history or a newly trained background tree. Both mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The experimental validation of the algorithm is based on the prediction of price movement directions one second ahead in the SPDR (Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked against other popular methods from the incremental online machine learning literature and is able to achieve competitive results.This research was funded by the Spanish Ministry of Economy and Competitiveness under grant number ENE2014-56126-C2-2-R

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Universidad Carlos III de Madrid e-Archivo

Optimization and Prediction Techniques for Self-Healing and Self-Learning Applications in a Trustworthy Cloud Continuum

Author: Alonso Juncal
Diaz de Arcaya Josu
Etxaniz Iñaki
López Lobo Jesús
Martinez Iñigo
Orue-Echevarria Leire
Osaba Eneko
Publication venue: 'MDPI AG'
Publication date: 01/07/2021
Field of study

The current IT market is more and more dominated by the “cloud continuum”. In the “traditional” cloud, computing resources are typically homogeneous in order to facilitate economies of scale. In contrast, in edge computing, computational resources are widely diverse, commonly with scarce capacities and must be managed very efficiently due to battery constraints or other limitations. A combination of resources and services at the edge (edge computing), in the core (cloud computing), and along the data path (fog computing) is needed through a trusted cloud continuum. This requires novel solutions for the creation, optimization, management, and automatic operation of such infrastructure through new approaches such as infrastructure as code (IaC). In this paper, we analyze how artificial intelligence (AI)-based techniques and tools can enhance the operation of complex applications to support the broad and multi-stage heterogeneity of the infrastructural layer in the “computing continuum” through the enhancement of IaC optimization, IaC self-learning, and IaC self-healing. To this extent, the presented work proposes a set of tools, methods, and techniques for applications’ operators to seamlessly select, combine, configure, and adapt computation resources all along the data path and support the complete service lifecycle covering: (1) optimized distributed application deployment over heterogeneous computing resources; (2) monitoring of execution platforms in real time including continuous control and trust of the infrastructural services; (3) application deployment and adaptation while optimizing the execution; and (4) application self-recovery to avoid compromising situations that may lead to an unexpected failure.This research was funded by the European project PIACERE (Horizon 2020 research and innovation Program, under grant agreement no 101000162)

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

TECNALIA Publications

On ensemble techniques for data stream regression

Author: Bifet Albert
Gomes Heitor Murilo
Mastelini Saulo Martiello
Montiel Jacob
Pfahringer Bernhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

An ensemble of learners tends to exceed the predictive performance of individual learners. This approach has been explored for both batch and online learning. Ensembles methods applied to data stream classification were thoroughly investigated over the years, while their regression counterparts received less attention in comparison. In this work, we discuss and analyze several techniques for generating, aggregating, and updating ensembles of regressors for evolving data streams. We investigate the impact of different strategies for inducing diversity into the ensemble by randomizing the input data (resampling, random subspaces and random patches). On top of that, we devote particular attention to techniques that adapt the ensemble model in response to concept drifts, including adaptive window approaches, fixed periodical resets and randomly determined windows. Extensive empirical experiments show that simple techniques can obtain similar predictive performance to sophisticated algorithms that rely on reactive adaptation (i.e., concept drift detection and recovery)

Crossref

Research Commons@Waikato

Towards time-evolving analytics: Online learning for time-dependent evolving data streams

Author: Albert Bifet
Alessio Bernardo
Emanuele Della Valle
Giacomo Ziffer
Vitor Cerqueira
Publication venue
Publication date: 20/12/2022
Field of study

Traditional historical data analytics is at risk in a world where volatility, uncertainty, complexity, and ambiguity are the new normal. While Streaming Machine Learning (SML) and Time-series Analytics (TSA) attack some aspects of the problem, we still need a comprehensive solution. SML trains models using fewer data and in a continuous/adaptive way relaxing the assumption that data points are identically distributed. TSA considers temporal dependence among data points, but it assumes identical distribution. Every Data Scientist fights this battle with ad-hoc solutions. In this paper, we claim that, due to the temporal dependence on the data, the existing solutions do not represent robust solutions to efficiently and automatically keep models relevant even when changes occur, and real-time processing is a must. We propose a novel and solid scientific foundation for Time-Evolving Analytics from this perspective. Such a framework aims to develop the logical, methodological, and algorithmic foundations for fast, scalable, and resilient analytics

Archivio istituzionale della ricerca - Politecnico di Milano

Open Access Repository

A survey on machine learning for recurring concept drifting data streams

Author: Cervantes Alejandro
Quintana David
Suárez-Cetrulo Andrés L.
Publication venue: 'Elsevier BV'
Publication date: 01/03/2023
Field of study

The problem of concept drift has gained a lot of attention in recent years. This aspect is key in many domains exhibiting non-stationary as well as cyclic patterns and structural breaks affecting their generative processes. In this survey, we review the relevant literature to deal with regime changes in the behaviour of continuous data streams. The study starts with a general introduction to the field of data stream learning, describing recent works on passive or active mechanisms to adapt or detect concept drifts, frequent challenges in this area, and related performance metrics. Then, different supervised and non-supervised approaches such as online ensembles, meta-learning and model-based clustering that can be used to deal with seasonalities in a data stream are covered. The aim is to point out new research trends and give future research directions on the usage of machine learning techniques for data streams which can help in the event of shifts and recurrences in continuous learning scenarios in near real-time

Universidad Carlos III de Madrid e-Archivo

Re-UNIR

A Statistical Drift Detection Method

Author: Micevska Simona
Publication venue
Publication date: 01/01/2019
Field of study

Masinõppemudelid eeldavad, et andmed pärinevad statsionaarsest jaotusest.Praktikas on tihti vaja mudelitega tõlgendada andmeid, mis pärinevad kiiresti dünaamiliselt muutuvast andmevoost. Seda muutust õppe- ja testvalimis nimetatakse kontseptuaalseks triiviks (ingl k concept drift). Kontseptuaalse triivi olemasolu võib kahjustada mudelennustuste täpsust ja usaldusväärsust. Seetõttu on kontseptuaalse triivi arvestamine väga oluline, et vähendada selle negatiivset mõju tulemustele. Kontseptuaalse triivi arvestamiseks tuleb see kõigepealt tuvastada. Selle tuvastamiseks kasutatakse triivi detektoreid. Reaktiivsed kontseptuaalse triivi detektorid püüavad tuvastada triivi niipe kui see ilmneb, jälgides aluseks oleva masinõppe mudeli toimimist. Tõlgendatavus on masinõppes tähtis ja meetod võib osutuda kasulikuks mitte ainult triivi olemasolu tuvastamiseks andmekogumis, vaid ka triivi põhjuste tuvastamisel ja analüüsimisel.Käesolevas töös rõhutatakse tõlgendatavuse tähtsust triivi tuvastamisel ja esitatakse statistilise triivi tuvastamise meetod (SDDM), mis tuvastab triivi kiiresti arenevates andmevoogudes, kusjuures võrdluses kaasaegsete meetoditega esineb vähem valepositiivseid ja valenegatiivsed tulemusi. Meetod annab ka kontseptuaalse triivi põhjuste tõlgenduse. Töös näidatakse meetodi tõhusust, rakendades seda nii sünteetilistele kui ka reaalsetele andmekogumitele.Machine learning models assume that data is drawn from a stationary distribution. However, in practice, challenges are imposed on models that need to make sense of fast-evolving data streams, where the content of data is changing and evolving dynamically over time. This change between the underlying distributions of the training and test datasets is called concept drift. The presence of concept drift may compromise the accuracy and reliability of prospective computational predictions. Therefore, handling concept drift is of great importance in the direction of diminishing its negative effects on a model's performance. In order to handle concept drift, one has to detect it first. Concept drift detectors have been used to accomplish this - reactive concept drift detectors try to detect drift as soon as it occurs by monitoring the performance of the underlying machine learning model. However, the importance of interpretability in machine learning indicates that it may prove useful to not only detect that drift is occurring in the data, but to also identify and analyze the causes of the drift. In this thesis, the importance of interpretability in drift detection is highlighted and the Statistical Drift Detection Method (SDDM) is presented, which detects drifts in fast-evolving data streams with a smaller number of false positives and false negatives when compared to the state-of-the-art, and has the ability to interpret the cause of the concept drift. The effectiveness of the method is demonstrated by applying it on both synthetic and real-world datasets

DSpace at Tartu University Library

Data stream mining: methods and challenges for handling concept drift.

Author: Elyan Eyad
Isaacs John
Wares Scott
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/10/2019
Field of study

Mining and analysing streaming data is crucial for many applications, and this area of research has gained extensive attention over the past decade. However, there are several inherent problems that continue to challenge the hardware and the state-of-the art algorithmic solutions. Examples of such problems include the unbound size, varying speed and unknown data characteristics of arriving instances from a data stream. The aim of this research is to portray key challenges faced by algorithmic solutions for stream mining, particularly focusing on the prevalent issue of concept drift. A comprehensive discussion of concept drift and its inherent data challenges in the context of stream mining is presented, as is a critical, in-depth review of relevant literature. Current issues with the evaluative procedure for concept drift detectors is also explored, highlighting problems such as a lack of established base datasets and the impact of temporal dependence on concept drift detection. By exposing gaps in the current literature, this study suggests recommendations for future research which should aid in the progression of stream mining and concept drift detection algorithms

Open Access Institutional Repository at Robert Gordon University

Process-Oriented Stream Classification Pipeline:A Literature Review

Author: Bossek Jakob
Clever Lena
Kerschke Pascal
Pohl Janina Susanne
Trautmann Heike
Publication venue
Publication date: 01/09/2022
Field of study

Featured Application: Nowadays, many applications and disciplines work on the basis of stream data. Common examples are the IoT sector (e.g., sensor data analysis), or video, image, and text analysis applications (e.g., in social media analytics or astronomy). With our work, we gather different approaches and terminology, and give a broad overview over the topic. Our main target groups are practitioners and newcomers to the field of data stream classification. Due to the rise of continuous data-generating applications, analyzing data streams has gained increasing attention over the past decades. A core research area in stream data is stream classification, which categorizes or detects data points within an evolving stream of observations. Areas of stream classification are diverse—ranging, e.g., from monitoring sensor data to analyzing a wide range of (social) media applications. Research in stream classification is related to developing methods that adapt to the changing and potentially volatile data stream. It focuses on individual aspects of the stream classification pipeline, e.g., designing suitable algorithm architectures, an efficient train and test procedure, or detecting so-called concept drifts. As a result of the many different research questions and strands, the field is challenging to grasp, especially for beginners. This survey explores, summarizes, and categorizes work within the domain of stream classification and identifies core research threads over the past few years. It is structured based on the stream classification process to facilitate coordination within this complex topic, including common application scenarios and benchmarking data sets. Thus, both newcomers to the field and experts who want to widen their scope can gain (additional) insight into this research area and find starting points and pointers to more in-depth literature on specific issues and research directions in the field.</p

University of Twente Research Information