Search CORE

74 research outputs found

A survey on online active learning

Author: Cacciarelli Davide
Kulahci Murat
Publication venue
Publication date: 14/03/2023
Field of study

Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

arXiv.org e-Print Archive

Minimum Description Length Model Selection - Problems and Extensions

Author: Rooij S. (Steven) de
Publication venue
Publication date: 10/09/2008
Field of study

The thesis treats a number of open problems in Minimum Description Length model selection, especially prediction problems. It is shown how techniques from the "Prediction with Expert Advice" literature can be used to improve model selection performance, which is particularly useful in nonparametric settings

CWI's Institutional Repository

A Survey on Concept Drift Adaptation

Author: Bifet A.
Bouchachia Abdelhamid
Gama J.
Pechenizkiy M.
Zliobaite Indre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art

Repository TU/e

Pure OAI Repository

Bournemouth University Research Online

Minimum description length revisited

Author: Grunwald Peter
Roos Teemu
Publication venue
Publication date: 22/11/2019
Field of study

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of MDL estimators. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC versus BIC and cross-validation versus Bayes can, to a large extent, be viewed from a unified perspective.Peer reviewe

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

Helsingin yliopiston digitaalinen arkisto

AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios

Author: Ferreira Raul Sena
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/06/2018
Field of study

Gradual concept-drift refers to a smooth and gradual change in the relations between input and output data in the underlying distribution over time. The problem generates a model obsolescence and consequently a quality decrease in predictions. Besides, there is a challenging task during the stream: The extreme verification latency (EVL) to verify the labels. For batch scenarios, state-of-the-art methods propose an adaptation of a supervised model by using an unconstrained least squares importance fitting (uLSIF) algorithm or a semi-supervised approach along with a core support extraction (CSE) method. However, these methods do not properly tackle the mentioned problems due to their high computational time for large data volumes, lack in representing the right samples of the drift or even for having several parameters for tuning. Therefore, we propose a density-based adaptive model for nonstationary data (AMANDA), which uses a semi-supervised classifier along with a CSE method. AMANDA has two variations: AMANDA with a fixed cutting percentage (AMANDA-FCP); and AMANDA with a dynamic cutting percentage (AMANDADCP). Our results indicate that the two variations of AMANDA outperform the state-of-the-art methods for almost all synthetic datasets and real ones with an improvement up to 27.98% regarding the average error. We have found that the use of AMANDA-FCP improved the results for a gradual concept-drift even with a small size of initial labeled data. Moreover, our results indicate that SSL classifiers are improved when they work along with our static or dynamic CSE methods. Therefore, we emphasize the importance of research directions based on this approach.Concept-drift gradual refere-se à mudança suave e gradual na distribuição dos dados conforme o tempo passa. Este problema causa obsolescência no modelo de aprendizado e queda na qualidade das previsões. Além disso, existe um complicador durante o processamento dos dados: a latência de verificação extrema (LVE) para se verificar os rótulos. Métodos do estado da arte propõem uma adaptação do modelo supervisionado usando uma abordagem de estimação de importância baseado em mínimos quadrados ou usando uma abordagem semi-supervisionada em conjunto com a extração de instâncias centrais, na sigla em inglês (CSE). Entretanto, estes métodos não tratam adequadamente os problemas mencionados devido ao fato de requererem alto tempo computacional para processar grandes volumes de dados, falta de correta seleção das instâncias que representam a mudança da distribuição, ou ainda por demandarem o ajuste de grande quantidade de parâmetros. Portanto, propomos um modelo adaptativo baseado em densidades para dados não-estacionários (AMANDA), que tem como base um classificador semi-supervisionado e um método CSE baseado em densidade. AMANDA tem duas variações: percentual de corte fixo (AMANDAFCP); e percentual de corte dinâmico (AMANDA-DCP). Nossos resultados indicam que as duas variações da proposta superam o estado da arte em quase todas as bases de dados sintéticas e reais em até 27,98% em relação ao erro médio. Concluímos que a aplicação do método AMANDA-FCP faz com que a classificação melhore mesmo quando há uma pequena porção inicial de dados rotulados. Mais ainda, os classificadores semi-supervisionados são melhorados quando trabalham em conjunto com nossos métodos de CSE, estático ou dinâmico

Pantheon

Algoritmos de aprendizagem adaptativos para classificadores de redes Bayesianas

Author: Jordán Gladys Castillo
Publication venue: Universidade de Aveiro
Publication date: 01/01/2006
Field of study

Doutoramento em MatemáticaNesta tese consideramos o desenvolvimento de algoritmos adaptativos para classificadores de redes Bayesianas (BNCs) num cenário on-line. Neste cenário os dados são apresentados sequencialmente. O modelo de decisão primeiro faz uma predição e logo este é actualizado com os novos dados. Um cenário on-line de aprendizagem corresponde ao cenário “prequencial” proposto por Dawid. Um algoritmo de aprendizagem num cenário prequencial é eficiente se este melhorar o seu desempenho dedutivo e, ao mesmo tempo, reduzir o custo da adaptação. Por outro lado, em muitas aplicações pode ser difícil melhorar o desempenho e adaptar-se a fluxos de dados que apresentam mudança de conceito. Neste caso, os algoritmos de aprendizagem devem ser dotados com estratégias de controlo e adaptação que garantem o ajuste rápido a estas mudanças. Todos os algoritmos adaptativos foram integrados num modelo conceptual de aprendizagem adaptativo e prequencial para classificação supervisada designado AdPreqFr4SL, o qual tem como objectivo primordial atingir um equilíbrio óptimo entre custo-qualidade e controlar a mudança de conceito. O equilíbrio entre custo-qualidade é abordado através do controlo do viés (bias) e da adaptação do modelo. Em vez de escolher uma única classe de BNCs durante todo o processo, propomo-nos utilizar a classe de classificadores Bayesianos k-dependentes (k-DBCs) e começar com o seu modelo mais simples: o classificador Naïve Bayes (NB) (quando o número máximo de dependências permissíveis entre os atributos, k, é 0). Podemos melhorar o desempenho do NB se reduzirmos o bias produto das restrições de independência. Com este fim, propomo-nos incrementar k gradualmente de forma a que em cada etapa de aprendizagem sejam seleccionados modelos de k-DBCs com uma complexidade crescente que melhor se vai ajustando ao actual montante de dados. Assim podemos evitar os problemas causados por demasiado viés (underfitting) ou demasiada variância (overfiting). Por outro lado, a adaptação da estrutura de um BNC com novos dados implica um custo computacional elevado. Propomo-nos reduzir nos custos da adaptação se, sempre que possível, usarmos os novos dados para adaptar os parâmetros. A estrutura é adaptada só em momentos esporádicos, quando é detectado que a sua adaptação é vital para atingir uma melhoria no desempenho. Para controlar a mudança de conceito, incluímos um método baseado no Controlo de Qualidade Estatístico que tem mostrado ser efectivo na detecção destas mudanças. Avaliamos os algoritmos adaptativos usando a classe de classificadores k-DBC em diferentes problemas artificiais e reais e mostramos as vantagens da sua implementação quando comparado com as versões no adaptativas.This thesis mainly addresses the development of adaptive learning algorithms for Bayesian network classifiers (BNCs) in an on-line leaning scenario. In this scenario data arrives at the learning system sequentially. The actual predictive model must first make a prediction and then update the current model with new data. This scenario corresponds to the Dawid’s prequential approach for statistical validation of models. An efficient adaptive algorithm in a prequential learning framework must be able, above all, to improve its predictive accuracy over time while reducing the cost of adaptation. However, in many real-world situations it may be difficult to improve and adapt to existing changing environments, a problem known as concept drift. In changing environments, learning algorithms should be provided with some control and adaptive mechanisms that effort to adjust quickly to these changes. We have integrated all the adaptive algorithms into an adaptive prequential framework for supervised learning called AdPreqFr4SL, which attempts to handle the cost-performance trade-off and also to cope with concept drift. The cost-quality trade-off is approached through bias management and adaptation control. The rationale is as follows. Instead of selecting a particular class of BNCs and using it during all the learning process, we use the class of k-Dependence Bayesian classifiers and start with the simple Naïve Bayes (by setting the maximum number of allowable attribute dependence k to 0). We can then improve the performance of Naïve Bayes over time if we trade-off the bias reduction which leads to the addition of new attribute dependencies with the variance reduction by accurately estimating the parameters. However, as the learning process advances we should place more focus on bias management. We reduce the bias resulting from the independence assumption by gradually adding dependencies between the attributes over time. To this end, we gradually increase k so that at each learning step we can use a class-model of k-DBCs that better suits the available data. Thus, we can avoid the problems caused by either too much bias (underfitting) or too much variance (overfitting). On the other hand, updating the structure of BNCs with new data is a very costly task. Hence some adaptation control is desirable to decide whether it is inevitable to adapt the structure. We reduce the cost of updating by using new data to primarily adapt the parameters. Only when it is detected that the use of the current structure no longer guarantees the desirable improvement in the performance, do we adapt the structure. To handle concept drift, our framework includes a method based on Statistical Quality Control, which has been demonstrated to be efficient for recognizing concept changes. We experimentally evaluated the AdPreqFr4SL on artificial domains and benchmark problems and show its advantages in comparison against its nonadaptive versions

Repositório Institucional da Universidade de Aveiro

A tutorial introduction to the minimum description length principle

Author: Grunwald Peter
Publication venue
Publication date: 04/06/2004
Field of study

This tutorial provides an overview of and introduction to Rissanen's Minimum Description Length (MDL) Principle. The first chapter provides a conceptual, entirely non-technical introduction to the subject. It serves as a basis for the technical introduction given in the second chapter, in which all the ideas of the first chapter are made mathematically precise. The main ideas are discussed in great conceptual and technical detail. This tutorial is an extended version of the first two chapters of the collection "Advances in Minimum Description Length: Theory and Application" (edited by P.Grunwald, I.J. Myung and M. Pitt, to be published by the MIT Press, Spring 2005).Comment: 80 pages 5 figures Report with 2 chapter

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

Minimum Description Length Revisited

Author: Grünwald P.D. (Peter)
Roos T. (Teemu)
Publication venue
Publication date: 22/11/2019
Field of study

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective

CWI's Institutional Repository