Search CORE

1,233 research outputs found

COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

Author: Basilico Justin D.
Dixon Kevin R.
Kegelmeyer W. Philip
Kolda Tamara G.
Munson M. Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

arXiv.org e-Print Archive

CiteSeerX

A Diversity-Accuracy Measure for Homogenous Ensemble Selection

Author: Adla A
Zouggar Taleb
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 24/02/2022
Field of study

Several selection methods in the literature are essentially based on an evaluation function that determines whether a model M contributes positively to boost the performances of the whole ensemble. In this paper, we propose a method called DIversity and ACcuracy for Ensemble Selection (DIACES) using an evaluation function based on both diversity and accuracy. The method is applied on homogenous ensembles composed of C4.5 decision trees and based on a hill climbing strategy. This allows selecting ensembles with the best compromise between maximum diversity and minimum error rate. Comparative studies show that in most cases the proposed method generates reduced size ensembles with better performances than usual ensemble simplification methods

Re-UNIR

Efficient & Effective Selective Query Rewriting with Efficiency Predictions

Author: Macdonald Craig
Ounis Iadh
Tonellotto Nicola
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine

Crossref

Archivio della Ricerca - Università di Pisa

Enlighten

An analysis of ensemble pruning techniques based on ordered aggregation

Author: Hernández-Lobato Daniel
Martínez-Muñoz Gonzalo
Suárez Alberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. G. Martínez-Muñoz, D. Hernández-Lobato and A. Suárez, "An analysis of ensemble pruning techniques based on ordered aggregation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 245-249, February 2009Several pruning strategies that can be used to reduce the size and increase the accuracy of bagging ensembles are analyzed. These heuristics select subsets of complementary classifiers that, when combined, can perform better than the whole ensemble. The pruning methods investigated are based on modifying the order of aggregation of classifiers in the ensemble. In the original bagging algorithm, the order of aggregation is left unspecified. When this order is random, the generalization error typically decreases as the number of classifiers in the ensemble increases. If an appropriate ordering for the aggregation process is devised, the generalization error reaches a minimum at intermediate numbers of classifiers. This minimum lies below the asymptotic error of bagging. Pruned ensembles are obtained by retaining a fraction of the classifiers in the ordered ensemble. The performance of these pruned ensembles is evaluated in several benchmark classification tasks under different training conditions. The results of this empirical investigation show that ordered aggregation can be used for the efficient generation of pruned ensembles that are competitive, in terms of performance and robustness of classification, with computationally more costly methods that directly select optimal or near-optimal subensembles.The authors acknowledge support form the Spanish Ministerio de Educación y Ciencia under Project TIN2007-66862-C02-0

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

An urn model for majority voting in classification ensembles

Author: Martínez-Muñoz Gonzalo
Soto Víctor
Suárez Alberto
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2016
Field of study

In this work we analyze the class prediction of parallel randomized ensembles by majority voting as an urn model. For a given test instance, the ensemble can be viewed as an urn of marbles of different colors. A marble represents an individual classifier. Its color represents the class label prediction of the corresponding classifier. The sequential querying of classifiers in the ensemble can be seen as draws without replacement from the urn. An analysis of this classical urn model based on the hypergeometric distribution makes it possible to estimate the confidence on the outcome of majority voting when only a fraction of the individual predictions is known. These estimates can be used to speed up the prediction by the ensemble. Specifically, the aggregation of votes can be halted when the confidence in the final prediction is sufficiently high. If one assumes a uniform prior for the distribution of possible votes the analysis is shown to be equivalent to a previous one based on Dirichlet distributions. The advantage of the current approach is that prior knowledge on the possible vote outcomes can be readily incorporated in a Bayesian framework. We show how incorporating this type of problem-specific knowledge into the statistical analysis of majority voting leads to faster classification by the ensemble and allows us to estimate the expected average speed-up beforehandThe authors acknowledge financial support from the Comunidad de Madrid (project CASI-CAMCM S2013/ICE-2845), and from the Spanish Ministerio de Economía y Competitividad (projects TIN2013-42351-P and TIN2015-70308-REDT

Biblos-e Archivo

Reduction of ensemble of classifiers with a rule sets analysis

Author: Szpunar-Huk Ewa
Publication venue: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Publication date: 01/01/2006
Field of study

The article shortly discusses the aim of classification task and its application to different domains of life. The idea of ensemble of classifiers is presented and some aspects of grouping methods are discussed. The paper points to the need of ensemble classifier pruning and presents a new approach for ensemble reduction. The proposed method is dedicated to committees of decision trees and bases on transformation of a tree set into a rule set and the new, suited to the pruning method, the weighted voting algorithm is also presented. There are also described experiments showing properties and effectiveness of the proposed method. Finally, directions of further research are mentioned

Biblioteka Nauki - repozytorium artykuÅÃ³w

University of Maria Curie-Skłodowska (UMCS): Scientific e-Journals / Uniwersytet Marii Curie-Skłodowskiej: e-czasopisma naukowe

Online Tool Condition Monitoring Based on Parsimonious Ensemble+

Author: Dimla Eric
Lughofer Edwin
Pedrycz Witold
Pratama Mahardhika
Tjahjowidowo Tegoeh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2019
Field of study

Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.Comment: this paper has been published by IEEE Transactions on Cybernetic

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)