Search CORE

604 research outputs found

Multilabel Consensus Classification

Author: Fan Wei
Gao Jing
Kong Xiangnan
Xie Sihong
Yu Philip S.
Publication venue
Publication date: 01/01/2013
Field of study

In the era of big data, a large amount of noisy and incomplete data can be collected from multiple sources for prediction tasks. Combining multiple models or data sources helps to counteract the effects of low data quality and the bias of any single model or data source, and thus can improve the robustness and the performance of predictive models. Out of privacy, storage and bandwidth considerations, in certain circumstances one has to combine the predictions from multiple models or data sources to obtain the final predictions without accessing the raw data. Consensus-based prediction combination algorithms are effective for such situations. However, current research on prediction combination focuses on the single label setting, where an instance can have one and only one label. Nonetheless, data nowadays are usually multilabeled, such that more than one label have to be predicted at the same time. Direct applications of existing prediction combination methods to multilabel settings can lead to degenerated performance. In this paper, we address the challenges of combining predictions from multiple multilabel classifiers and propose two novel algorithms, MLCM-r (MultiLabel Consensus Maximization for ranking) and MLCM-a (MLCM for microAUC). These algorithms can capture label correlations that are common in multilabel classifications, and optimize corresponding performance metrics. Experimental results on popular multilabel classification tasks verify the theoretical analysis and effectiveness of the proposed methods

arXiv.org e-Print Archive

CiteSeerX

Crossref

Improving Prostate Cancer Classification: A Round Robin Forward Sequential Selection Approach

Author: Ahmed Bouridane
Mohamed Ali Roula
Sabrina Bouatmane
Somaya Al-Maadeed
Publication venue: 'IntechOpen'
Publication date: 21/11/2011
Field of study

IntechOpen

Improved Multi-Class Cost-Sensitive Boosting via Estimation of the Minimum-Risk Class

Author: Appel Ron
Burgos-Artizzu Xavier P.
Perona Pietro
Publication venue
Publication date: 01/01/2016
Field of study

We present a simple unified framework for multi-class cost-sensitive boosting. The minimum-risk class is estimated directly, rather than via an approximation of the posterior distribution. Our method jointly optimizes binary weak learners and their corresponding output vectors, requiring classes to share features at each iteration. By training in a cost-sensitive manner, weak learners are invested in separating classes whose discrimination is important, at the expense of less relevant classification boundaries. Additional contributions are a family of loss functions along with proof that our algorithm is Boostable in the theoretical sense, as well as an efficient procedure for growing decision trees for use as weak learners. We evaluate our method on a variety of datasets: a collection of synthetic planar data, common UCI datasets, MNIST digits, SUN scenes, and CUB-200 birds. Results show state-of-the-art performance across all datasets against several strong baselines, including non-boosting multi-class approaches

arXiv.org e-Print Archive

Caltech Authors

Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers

Author: Busa-Fekete Róbert
Kégl Balázs
Éltető Tamás
Szarvas György
Publication venue: Springer
Publication date: 01/01/2013
Field of study

ANR-2010-COSI-002In subset ranking, the goal is to learn a ranking function that approximates a gold standard partial ordering of a set of objects (in our case, a set of documents retrieved for the same query). The partial ordering is given by relevance labels representing the relevance of documents with respect to the query on an absolute scale. Our approach consists of three simple steps. First, we train standard multi-class classifiers (AdaBoost.MH and multi-class SVM) to discriminate between the relevance labels. Second, the posteriors of multi-class classifiers are calibrated using probabilistic and regression losses in order to estimate the Bayes-scoring function which optimizes the Normalized Discounted Cumulative Gain (NDCG). In the third step, instead of selecting the best multi-class hyperparameters and the best calibration, we mix all the learned models in a simple ensemble scheme. Our extensive experimental study is itself a substantial contribution. We compare most of the existing learning-to-rank techniques on all of the available large-scale benchmark data sets using a standardized implementation of the NDCG score. We show that our approach is competitive with conceptually more complex listwise and pairwise methods, and clearly outperforms them as the data size grows. As a technical contribution, we clarify some of the confusing results related to the ambiguities of the evaluation tools, and propose guidelines for future studies

HAL-IN2P3

Crossref

Publikationer från Linköpings universitet

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Cost-sensitive boosting: A unified approach

Author: Nikolaou Nikolaos
Publication venue
Publication date: 01/08/2017
Field of study

The University of Manchester - Institutional Repository

Multi-class Classification with Machine Learning and Fusion

Author: Garcia Cifuentes Cristina
Publication venue: École nationale supérieure des télécommunications (França)
Publication date: 10/06/2009
Field of study

Treball realitzat a TELECOM ParisTech i EADS FranceMulti-class classification is the core issue of many pattern recognition tasks. Several applications require high-end machine learning solutions to provide satisfying results in operational contexts. However, most efficient ones, like SVM or Boosting, are generally mono-class, which introduces the problem of translating a global multi-class problem is several binary problems, while still being able to provide at the end an answer to the original multi-class issue. Present work aims at providing a solution to this multi-class problematic, by introducing a complete framework with a strong probabilistic and structured basis. It includes the study of error correcting output codes correlated with the definition of an optimal subdivision of the multi-class issue in several binary problems, in a complete automatic way. Machine learning algorithms are studied and benchmarked to facilitate and justify the final selection. Coupling of automatically calibrated classifiers output is obtained by applying iterative constrained regularisations, and a logical temporal fusion is applied on temporal-redundant data (like tracked vehicles) to enhance performances. Finally, ranking scores are computed to optimize precision and recall is ranking-based systems. Each step of the previously described system has been analysed from a theoretical an empirical point of view and new contributions are introduced, so as to obtain a complete mathematically coherent framework which is both generic and easy-to-use, as the learning procedure is almost completely automatic. On top of that, quantitative evaluations on two completely different datasets have assessed both the exactitude of previous assertions and the improvements that were achieved compared to previous methods

UPCommons. Portal del coneixement obert de la UPC

Probabilistic multiple kernel learning

Author: Damoulas Theodoros
Publication venue
Publication date: 01/01/2009
Field of study

The integration of multiple and possibly heterogeneous information sources for an overall decision-making process has been an open and unresolved research direction in computing science since its very beginning. This thesis attempts to address parts of that direction by proposing probabilistic data integration algorithms for multiclass decisions where an observation of interest is assigned to one of many categories based on a plurality of information channels

Glasgow Theses Service

Recommended from our members

Parallelizing support vector machines for scalable image annotation

Author: Alham Nasullah Khalid
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2011
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced. The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers. The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications

Brunel University Research Archive