Search CORE

65 research outputs found

Bandit-Aided Boosting

Author: Busa-Fekete Róbert
Kégl Balázs
Publication venue: HAL CCSD
Publication date: 12/12/2009
Field of study

In this paper we apply multi-armed bandits (MABs) to accelerate ADABOOST. ADABOOST constructs a strong classiﬁer in a stepwise fashion by selecting simple base classiﬁers and using their weighted "vote" to determine the ﬁnal classiﬁcation. We model this stepwise base classiﬁer selection as a sequential decision problem, and optimize it with MABs. Each arm represent a subset of the base classiﬁer set. The MAB gradually learns the "utility" of the subsets, and selects one of the subsets in each iteration. ADABOOST then searches only this subset instead of optimizing the base classiﬁer over the whole space. The reward is deﬁned as a function of the accuracy of the base classiﬁer. We investigate how the MAB algorithms (UCB, UCT) can be applied in the case of boosted stumps, trees, and products of base classiﬁers. On benchmark datasets, our bandit-based approach achieves only slightly worse test errors than the standard boosted learners for a computational cost that is an order of magnitude smaller than with standard ADABOOST

HAL-CentraleSupelec

HAL-IN2P3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Extracting human protein information from MEDLINE using a full-sentence parser

Author: Busa-Fekete Róbert
Kocsor András
Publication venue
Publication date: 01/01/2008
Field of study

Today, a fair number of systems are available for the task of processing biological data. The development of effective systems is of great importance since they can support both the research and the everyday work of biologists. It is well known that biological databases are large both in size and number, hence data processing technologies are required for the fast and effective management of the contents stored in databases like MEDLINE. A possible solution for content management is the application of natural language processing methods to help make this task easier. With our approach we would like to learn more about the interactions of human genes using full-sentence parsing. Given a sentence, the syntactic parser assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a constituent representation of a sentence (showing noun phrases, verb phrases, and so on). Here we show experimentally that using the syntactic information of each abstract, the biological interactions of genes can be predicted. Hence, it is worth developing the kind of information extraction (IE) system that can retrieve information about gene interactions just by using syntactic information contained in these text. Our IE system can handle certain types of gene interactions with the help of machine learning (ML) methodologies (Hidden Markov Models, Artificial Neural Networks, Decision Trees, Support Vector Machines). The experiments and practical usage show clearly that our system can provide a useful intuitive guide for biological researchers in their investigations and in the design of their experiments

University of Szeged

Fast classification using sparse decision DAGs

Author: Benbouzid D.
Busa-Fekete Róbert
Kégl Balázs
Publication venue: Omnipress
Publication date: 01/01/2012
Field of study

ISBN: 978-1-4503-1285-1International audienceIn this paper we propose an algorithm that builds sparse decision DAGs (directed acyclic graphs) out of a list of base classifiers provided by an external learning method such as AdaBoost. The basic idea is to cast the DAG design task as a Markov decision process. Each instance can decide to use or to skip each base classifier, based on the current state of the classifier being built. The result is a sparse decision DAG where the base classifiers are selected in a data-dependent way. The method has a single hyperparameter with a clear semantics of controlling the accuracy/speed trade-off. The algorithm is competitive with state-of-the-art cascade detectors on three object-detection benchmarks, and it clearly outperforms them in the regime of low number of base classifiers. Unlike cascades, it is also readily applicable for multi-class classification. Using the multi-class setup, we show on a benchmark web page ranking data set that we can significantly improve the decision speed without harming the performance of the ranker

HAL-CentraleSupelec

arXiv.org e-Print Archive

CiteSeerX

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

MDDAG: learning deep decision DAGs in a Markov decision process setup

Author: Benbouzid D.
Busa-Fekete Róbert
Kégl Balázs
Publication venue: HAL CCSD
Publication date: 01/12/2011
Field of study

In this paper we propose an algorithm that builds sparse decision DAGs (directed acyclic graphs) out of a list of features or base classifiers. The basic idea is to cast the DAG design task as a Markov decision process. Each instance can decide to use or to skip each base classifier, based on the current state of the classifier being built. The result is a sparse decision DAG where the base classifiers are selected in a data-dependent way. The development of algorithm was directly motivated by improving the traditional cascade design in applications where the computational requirements of classifying a test instance are as important as the performance of the classifier itself. Beside outperforming classical cascade designs on benchmark data sets, the algorithm also produces interesting deep structures where similar input data follows the same path in the DAG, and subpaths of increasing length represent features of increasing complexity

HAL-CentraleSupelec

HAL-IN2P3

INRIA a CCSD electronic archive server

Hal-Diderot

The hydrogeological aspects of Lake Fehér, Kardoskút, Southern Hungary

Author: Busa-Fekete Bertalan
Hegyi Róbert
Szanyi János
Publication venue
Publication date: 01/01/2003
Field of study

University of Szeged

An apple-to-apple comparison of Learning-to-rank algorithms in terms of Normalized Discounted Cumulative Gain

Author: Busa-Fekete Róbert
Kégl B.
Szarvas György
Élteto Tamás
Publication venue: 'IOS Press'
Publication date: 28/08/2012
Field of study

International audienceThe Normalized Discounted Cumulative Gain (NDCG) is a widely used evaluation metric for learning-to-rank (LTR) systems. NDCG is designed for ranking tasks with more than one relevance levels. There are many freely available, open source tools for computing the NDCG score for a ranked result list. Even though the definition of NDCG is unambiguous, the various tools can produce different scores for ranked lists with certain properties, deteriorating the empirical tests in many published papers and thereby making the comparison of empirical results published in different studies difficult to compare. In this study, first, we identify the major differences between the various publicly available NDCG evaluation tools. Second, based on a set of comparative experiments using a common benchmark dataset in LTR research and 6 different LTR algorithms, we demonstrate how these differences affect the overall performance of different algorithms and the final scores that are used to compare different systems

HAL-CentraleSupelec

HAL-IN2P3

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Online ranking combination

Author: Bennett James
Busa-Fekete Róbert
Igel Christian
Pilászy I
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Crossref

SZTAKI Publication Repository

Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers

Author: Busa-Fekete Róbert
Kégl Balázs
Éltető Tamás
Szarvas György
Publication venue: Springer
Publication date: 01/01/2013
Field of study

ANR-2010-COSI-002In subset ranking, the goal is to learn a ranking function that approximates a gold standard partial ordering of a set of objects (in our case, a set of documents retrieved for the same query). The partial ordering is given by relevance labels representing the relevance of documents with respect to the query on an absolute scale. Our approach consists of three simple steps. First, we train standard multi-class classifiers (AdaBoost.MH and multi-class SVM) to discriminate between the relevance labels. Second, the posteriors of multi-class classifiers are calibrated using probabilistic and regression losses in order to estimate the Bayes-scoring function which optimizes the Normalized Discounted Cumulative Gain (NDCG). In the third step, instead of selecting the best multi-class hyperparameters and the best calibration, we mix all the learned models in a simple ensemble scheme. Our extensive experimental study is itself a substantial contribution. We compare most of the existing learning-to-rank techniques on all of the available large-scale benchmark data sets using a standardized implementation of the NDCG score. We show that our approach is competitive with conceptually more complex listwise and pairwise methods, and clearly outperforms them as the data size grows. As a technical contribution, we clarify some of the confusing results related to the ambiguities of the evaluation tools, and propose guidelines for future studies

HAL-IN2P3

Crossref

Publikationer från Linköpings universitet

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Digitala Vetenskapliga Arkivet - Academic Archive On-line