Search CORE

12 research outputs found

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

Author: Igel Christian
Lorenzen Stephan S.
Masegosa Andrés R.
Seldin Yevgeny
Publication venue
Publication date: 01/01/2020
Field of study

We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble

arXiv.org e-Print Archive

Copenhagen University Research Information System

Comparing Binary and Standard Probability Trees in Credal Networks Inference

Author: Andrés Cano
Andrés R Masegosa
Manuel Gómez-Olmedo
Serafín Moral
Publication venue
Publication date: 01/01/2011
Field of study

Abstract This paper proposes the use of Binary Probability Trees in the propagation of credal networks. Standard and binary probability trees are suitable data structures for representing potentials because they allow to control the accuracy of inference algorithms by means of a threshold parameter. The choice of this threshold is a trade-off between accuracy and computing time. Binary trees enable the representation of finer-grained independences than probability trees. This leads to more efficient algorithms for credal networks with variables with more than two states. The paper shows experiments comparing binary and standard probability trees in order to demonstrate their performance

CiteSeerX

Discretization of expression quantitative trait loci in association analysis between genotypes and expression data

Author: Abad Grau María Mar
Armañanzas Arnedillo Ruben
Bielza Lozoya María Concepción
Larrañaga Múgica Pedro María
Masegosa Andrés R.
Matesán del Barrio Fuencisla
Moral Callejón Serafín
Potenciano Enciso Víctor
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2015
Field of study

Expression quantitative trait loci are used as a tool to identify genetic causes of natural variation in gene expression. Only in a few cases the expression of a gene is controlled by a variant on a single genetic marker. There is a plethora of different complexity levels of interaction effects within markers, within genes and between marker and genes. This complexity challenges biostatisticians and bioinformatitians every day and makes findings difficult to appear. As a way to simplify analysis and better control confounders, we tried a new approach for association analysis between genotypes and expression data. We pursued to understand whether discretization of expression data can be useful in genome-transcriptome association analyses. By discretizing the dependent variable, algorithms for learning classifiers from data as well as performing block selection were used to help understanding the relationship between the expression of a gene and genetic markers. We present the results of using this approach to detect new possible causes of expression variation of DRB5, a gene playing an important role within the immune system. Together with expression of gene DRB5 obtained from the classical microarray technology, we have also measured DRB5 expression by using the more recent next-generation sequencing technology. A supplementary website including a link to the software with the method implemented can be found at http: //bios.ugr.es/DRB5

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

An ensemble method using credal decision trees

Author: Abellán Joaquín
Masegosa Andrés R.
Publication venue
Publication date
Field of study

Supervised classification learning can be considered as an important tool for decision support. In this paper, we present a method for supervised classification learning, which ensembles decision trees obtained via convex sets of probability distributions (also called credal sets) and uncertainty measures. Our method forces the use of different decision trees and it has mainly the following characteristics: it obtains a good percentage of correct classifications and an improvement in time of processing compared with known classification methods; it not needs to fix the number of decision trees to be used; and it can be parallelized to apply it on very large data sets.Imprecise probabilities Credal sets Imprecise Dirichlet model Uncertainty measures Supervised classification Decision trees

Research Papers in Economics

A memory efficient semi-Naive Bayes classifier with grouping of cases

Author: Andrés Cano
Andrés R. Masegosa
Joaquín Abellán
Serafín Moral
Publication venue: 'IOS Press'
Publication date
Field of study

Crossref

Diversity and Generalization in Neural Network Ensembles

Author: Cabañas Rafael
Masegosa Andrés R.
Ortega Luis A.
Publication venue
Publication date: 01/01/2022
Field of study

Ensembles are widely used in machine learning and, usually, provide state-of-the-art performance in many prediction tasks. From the very beginning, the diversity of an ensemble has been identified as a key factor for the superior performance of these models. But the exact role that diversity plays in ensemble models is poorly understood, specially in the context of neural networks. In this work, we combine and expand previously published results in a theoretically sound framework that describes the relationship between diversity and ensemble performance for a wide range of ensemble methods. More precisely, we provide sound answers to the following questions: how to measure diversity, how diversity relates to the generalization error of an ensemble, and how diversity is promoted by neural network ensemble algorithms. This analysis covers three widely used loss functions, namely, the squared loss, the cross-entropy loss, and the 0-1 loss; and two widely used model combination strategies, namely, model averaging and weighted majority vote. We empirically validate this theoretical analysis with neural network ensembles.</p

arXiv.org e-Print Archive

VBN