Search CORE

67 research outputs found

Handling Concept Drifts in Regression Problems -- the Error Intersection Approach

Author: Baier Lucas
Hofmann Marcel
Kühl Niklas
Mohr Marisa
Satzger Gerhard
Publication venue: 'GITO mbH Verlag'
Publication date: 15/01/2020
Field of study

Machine learning models are omnipresent for predictions on big data. One challenge of deployed models is the change of the data over time, a phenomenon called concept drift. If not handled correctly, a concept drift can lead to significant mispredictions. We explore a novel approach for concept drift handling, which depicts a strategy to switch between the application of simple and complex machine learning models for regression tasks. We assume that the approach plays out the individual strengths of each model, switching to the simpler model if a drift occurs and switching back to the complex model for typical situations. We instantiate the approach on a real-world data set of taxi demand in New York City, which is prone to multiple drifts, e.g. the weather phenomena of blizzards, resulting in a sudden decrease of taxi demand. We are able to show that our suggested approach outperforms all regarded baselines significantly

arXiv.org e-Print Archive

Crossref

KITopen

How to Cope with Change? - Preserving Validity of Predictive Services over Time

Author: Baier Lucas
Kühl Niklas
Satzger Gerhard
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2019
Field of study

Companies more and more rely on predictive services which are constantly monitoring and analyzing the available data streams for better service offerings. However, sudden or incremental changes in those streams are a challenge for the validity and proper functionality of the predictive service over time. We develop a framework which allows to characterize and differentiate predictive services with regard to their ongoing validity. Furthermore, this work proposes a research agenda of worthwhile research topics to improve the long-term validity of predictive services. In our work, we especially focus on different scenarios of true label availability for predictive services as well as the integration of expert knowledge. With these insights at hand, we lay an important foundation for future research in the field of valid predictive services

KITopen

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Anomaly detection from time-changing environmental sensor data streams

Author: Mamun Abdullah-Al-
Publication venue: Memorial University of Newfoundland
Publication date: 01/01/2016
Field of study

This thesis stems from the project with real-time environmental monitoring company EMSAT Corporation. They were looking for methods to automatically ag spikes and other anomalies in their environmental sensor data streams. The problem presents several challenges: near real-time anomaly detection, absence of labeled data and time-changing data streams. Here, we address this problem using both a statistical parametric approach as well as a non-parametric approach like Kernel Density Estimation (KDE). The main contribution of this thesis is extending the KDE to work more effectively for evolving data streams, particularly in presence of concept drift. To address that, we have developed a framework for integrating Adaptive Windowing (ADWIN) change detection algorithm with KDE. We have tested this approach on several real world data sets and received positive feedback from our industry collaborator. Some results appearing in this thesis have been presented at ECML PKDD 2015 Doctoral Consortium

Memorial University Research Repository

Efficient Algorithms for Discovering Concept Drift in Business Processes.

Author: Martjušev Jevgeni
Publication venue: Tartu Ülikool
Publication date: 01/01/2013
Field of study

Protsessikaeve on suhteliselt uus, kuid ühiskonna poolt juba kasutusele võetud uurimisvaldkond. Paljud ettevõtted ja asutused rakendavad erinevaid infosüsteemidega toetatud protsesse, mille käivitamisest jäävad maha sündmuste logid. Neid logisid analüüsides saab ehitada mudeli, mis kajastab, kuidas need protsessid reaalselt toimivad. Tänapäevased algoritmid eeldavad, et analüüsitav protsess on stabiilne, kuid tegelikult võib seda mõjutada hooaegsus, uus seadus või mõni väline sündmus – näiteks järsk majanduslangus. Sellisel juhul on tegemist kontseptsiooninihkega. Kontseptsiooninihked võivad olla järsud (kui protsessi muutus on äkiline) või järkjärgulised (kui üks protsessivariant asendub teisega sujuvalt). Antud töös pakkusime välja viis uudset lähenemist kontseptsiooninihke avastamiseks protsessikaeves. Igaüks neist parandab või laiendab algset Bose poolt kirjeldatud algoritmi [1]. Sammu pikkuse suurendamine võimaldab algoritmi kiirendada, jättes välja mõned vahepealsed sammud. Muutmispunkti automaatne leidmine võimaldab ekstraheerida kontseptsiooninihke punktid ilma manuaalse analüüsita. Adapteerivate akende algoritm (ADWIN) pehmendab originaalse algoritmi sõltuvust populatsiooni suurusest, seega vähendab vale-positiivsete ja vale-negatiivsete tulemuste arvu. Mittejärjestikkuste populatsioonidega algoritm võimaldab uurida järkjärgulisi kontseptsiooninihkeid. Lisaks lubab populatsioonide suuruste määramine ajaliste perioodide kaupa (jälgede koguse asemel) leida mikro-taseme ja makro-taseme nihked multi-taseme dünaamikaga logides, kus protsess muutub mitmel detailsuse tasemel. Kõik algoritmid olid implementeetirud ProM raamistiku Concept Drift moodulis. Algoritmide kvaliteedi hindamiseks pakub käesolev töö välja meetodi, kus CPN Tools programmi abil genereeritakse logisid erinevate kontseptsiooninihke tunnustega. Samuti on välja arendatud kvaliteedi hindamise raamistik, mis sarnaneb sellega, mis on kasutusel infootsingu valdkonnas ning mis hõlmab endas tegelike positiivsete, valepositiivsete ja valenegatiivsete väärtuste loendamist ning tuletatud meetrikate arvutamist. Algoritmid olid edukalt testitud nii simuleeritud, kui ka päriselu andmetega. [1] Bose, R.P.J.C., van der Aalst, W.M.P., Žliobaitė, I., Pechenizkiy, M.: Handling Concept Drift in Process Mining. In: CAiSE. LNCS, vol. 6741, pp. 391–405.Springer, Berlin (2011)Process mining is a relatively new research area, but it is already used in practice. Every company and organization run different business processes, which are supported by information systems and which leave event logs while being executed. By analyzing those logs one can build a process model, which reflects how the process operates in reality.Existing algorithms assume that the analyzed process is in steady state, however it could be altered because of seasonality, a new law or some event, like a financial crisis. In this case, we have to deal with concept drift. Concept drifts can be sudden, when the change is abrupt and gradual, where one concept fades gradually while the other takes over. In this work we proposed five novel approaches for detecting concept drifts in process mining. All of them improve or expand the algorithm, proposed by Bose et al [1]. Step size improvement allows to speed up the algorithm by leaving out some intermediate steps. Automatic change point detection algorithm allows to extract the concept drift points without the need to analyze the plot manually. The adaptive windows algorithm (ADWIN) relaxes the original algorithm's dependency on the fixed population size, thus reducing the amount of false positives and false negatives. The algorithm with non-continuous populations allows to deal with gradual drifts. And finally, defining the population sizes in terms of time periods instead of trace amount allows to detect micro-level and macro-level drifts in logs with multi-order dynamics, where process changes can happen on multiple level of granularity. The algorithms were implemented in the Concept Drift plug-in of ProM framework. For assessing the quality of algorithms, we proposed a way to generate logs with different concept drift characteristics using CPN Tools and a quality evaluation framework, similar to the one used in the field information retrieval, involving calculating true positives, false positives, false negative and derived metrics. The algotihms were successfully tested on both simulated and real-life data. [1] Bose, R.P.J.C., van der Aalst, W.M.P., Žliobaitė, I., Pechenizkiy, M.: Handling Concept Drift in Process Mining. In: CAiSE. LNCS, vol. 6741, pp. 391–405.Springer, Berlin (2011

DSpace at Tartu University Library

A Survey on Concept Drift Adaptation

Author: Bifet A.
Bouchachia Abdelhamid
Gama J.
Pechenizkiy M.
Zliobaite Indre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art

Repository TU/e

Crossref

Pure OAI Repository

Bournemouth University Research Online

Thermochemical Conversion Processes for Solid Fuels and Renewable Energies

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

It is widely believed that a large proportion of greenhouse gas emissions originated anthropogenically from the use of fossil fuels with additional contributions coming from manufactured materials, deforestation, soil erosion, and agriculture (including livestock). The global society actively supports measures to create a flexible and low-carbon energy economy to attenuate climate change and its devastating environmental consequences. In this Special Issue, the recent advancements in the next-generation thermochemical conversion processes for solid fuels and renewable energies (e.g., the operational flexibility of co-combustion of biomass and lignite, integrated solar combined cycle power plants, and advanced gasification systems such as the sorption-enhanced gasification and the chemical looping gasification) were shown

Directory of Open Access Books (DOAB)

AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios

Author: Ferreira Raul Sena
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/06/2018
Field of study

Gradual concept-drift refers to a smooth and gradual change in the relations between input and output data in the underlying distribution over time. The problem generates a model obsolescence and consequently a quality decrease in predictions. Besides, there is a challenging task during the stream: The extreme verification latency (EVL) to verify the labels. For batch scenarios, state-of-the-art methods propose an adaptation of a supervised model by using an unconstrained least squares importance fitting (uLSIF) algorithm or a semi-supervised approach along with a core support extraction (CSE) method. However, these methods do not properly tackle the mentioned problems due to their high computational time for large data volumes, lack in representing the right samples of the drift or even for having several parameters for tuning. Therefore, we propose a density-based adaptive model for nonstationary data (AMANDA), which uses a semi-supervised classifier along with a CSE method. AMANDA has two variations: AMANDA with a fixed cutting percentage (AMANDA-FCP); and AMANDA with a dynamic cutting percentage (AMANDADCP). Our results indicate that the two variations of AMANDA outperform the state-of-the-art methods for almost all synthetic datasets and real ones with an improvement up to 27.98% regarding the average error. We have found that the use of AMANDA-FCP improved the results for a gradual concept-drift even with a small size of initial labeled data. Moreover, our results indicate that SSL classifiers are improved when they work along with our static or dynamic CSE methods. Therefore, we emphasize the importance of research directions based on this approach.Concept-drift gradual refere-se à mudança suave e gradual na distribuição dos dados conforme o tempo passa. Este problema causa obsolescência no modelo de aprendizado e queda na qualidade das previsões. Além disso, existe um complicador durante o processamento dos dados: a latência de verificação extrema (LVE) para se verificar os rótulos. Métodos do estado da arte propõem uma adaptação do modelo supervisionado usando uma abordagem de estimação de importância baseado em mínimos quadrados ou usando uma abordagem semi-supervisionada em conjunto com a extração de instâncias centrais, na sigla em inglês (CSE). Entretanto, estes métodos não tratam adequadamente os problemas mencionados devido ao fato de requererem alto tempo computacional para processar grandes volumes de dados, falta de correta seleção das instâncias que representam a mudança da distribuição, ou ainda por demandarem o ajuste de grande quantidade de parâmetros. Portanto, propomos um modelo adaptativo baseado em densidades para dados não-estacionários (AMANDA), que tem como base um classificador semi-supervisionado e um método CSE baseado em densidade. AMANDA tem duas variações: percentual de corte fixo (AMANDAFCP); e percentual de corte dinâmico (AMANDA-DCP). Nossos resultados indicam que as duas variações da proposta superam o estado da arte em quase todas as bases de dados sintéticas e reais em até 27,98% em relação ao erro médio. Concluímos que a aplicação do método AMANDA-FCP faz com que a classificação melhore mesmo quando há uma pequena porção inicial de dados rotulados. Mais ainda, os classificadores semi-supervisionados são melhorados quando trabalham em conjunto com nossos métodos de CSE, estático ou dinâmico

Pantheon

Heating Strategies in a Renewable Energy Transition

Author: Lund Rasmus Søgaard
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2017
Field of study

VBN