604 research outputs found

    Multilabel Consensus Classification

    Full text link
    In the era of big data, a large amount of noisy and incomplete data can be collected from multiple sources for prediction tasks. Combining multiple models or data sources helps to counteract the effects of low data quality and the bias of any single model or data source, and thus can improve the robustness and the performance of predictive models. Out of privacy, storage and bandwidth considerations, in certain circumstances one has to combine the predictions from multiple models or data sources to obtain the final predictions without accessing the raw data. Consensus-based prediction combination algorithms are effective for such situations. However, current research on prediction combination focuses on the single label setting, where an instance can have one and only one label. Nonetheless, data nowadays are usually multilabeled, such that more than one label have to be predicted at the same time. Direct applications of existing prediction combination methods to multilabel settings can lead to degenerated performance. In this paper, we address the challenges of combining predictions from multiple multilabel classifiers and propose two novel algorithms, MLCM-r (MultiLabel Consensus Maximization for ranking) and MLCM-a (MLCM for microAUC). These algorithms can capture label correlations that are common in multilabel classifications, and optimize corresponding performance metrics. Experimental results on popular multilabel classification tasks verify the theoretical analysis and effectiveness of the proposed methods

    Improved Multi-Class Cost-Sensitive Boosting via Estimation of the Minimum-Risk Class

    Get PDF
    We present a simple unified framework for multi-class cost-sensitive boosting. The minimum-risk class is estimated directly, rather than via an approximation of the posterior distribution. Our method jointly optimizes binary weak learners and their corresponding output vectors, requiring classes to share features at each iteration. By training in a cost-sensitive manner, weak learners are invested in separating classes whose discrimination is important, at the expense of less relevant classification boundaries. Additional contributions are a family of loss functions along with proof that our algorithm is Boostable in the theoretical sense, as well as an efficient procedure for growing decision trees for use as weak learners. We evaluate our method on a variety of datasets: a collection of synthetic planar data, common UCI datasets, MNIST digits, SUN scenes, and CUB-200 birds. Results show state-of-the-art performance across all datasets against several strong baselines, including non-boosting multi-class approaches

    Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers

    Get PDF
    ANR-2010-COSI-002In subset ranking, the goal is to learn a ranking function that approximates a gold standard partial ordering of a set of objects (in our case, a set of documents retrieved for the same query). The partial ordering is given by relevance labels representing the relevance of documents with respect to the query on an absolute scale. Our approach consists of three simple steps. First, we train standard multi-class classifiers (AdaBoost.MH and multi-class SVM) to discriminate between the relevance labels. Second, the posteriors of multi-class classifiers are calibrated using probabilistic and regression losses in order to estimate the Bayes-scoring function which optimizes the Normalized Discounted Cumulative Gain (NDCG). In the third step, instead of selecting the best multi-class hyperparameters and the best calibration, we mix all the learned models in a simple ensemble scheme. Our extensive experimental study is itself a substantial contribution. We compare most of the existing learning-to-rank techniques on all of the available large-scale benchmark data sets using a standardized implementation of the NDCG score. We show that our approach is competitive with conceptually more complex listwise and pairwise methods, and clearly outperforms them as the data size grows. As a technical contribution, we clarify some of the confusing results related to the ambiguities of the evaluation tools, and propose guidelines for future studies

    Multi-class Classification with Machine Learning and Fusion

    Get PDF
    Treball realitzat a TELECOM ParisTech i EADS FranceMulti-class classification is the core issue of many pattern recognition tasks. Several applications require high-end machine learning solutions to provide satisfying results in operational contexts. However, most efficient ones, like SVM or Boosting, are generally mono-class, which introduces the problem of translating a global multi-class problem is several binary problems, while still being able to provide at the end an answer to the original multi-class issue. Present work aims at providing a solution to this multi-class problematic, by introducing a complete framework with a strong probabilistic and structured basis. It includes the study of error correcting output codes correlated with the definition of an optimal subdivision of the multi-class issue in several binary problems, in a complete automatic way. Machine learning algorithms are studied and benchmarked to facilitate and justify the final selection. Coupling of automatically calibrated classifiers output is obtained by applying iterative constrained regularisations, and a logical temporal fusion is applied on temporal-redundant data (like tracked vehicles) to enhance performances. Finally, ranking scores are computed to optimize precision and recall is ranking-based systems. Each step of the previously described system has been analysed from a theoretical an empirical point of view and new contributions are introduced, so as to obtain a complete mathematically coherent framework which is both generic and easy-to-use, as the learning procedure is almost completely automatic. On top of that, quantitative evaluations on two completely different datasets have assessed both the exactitude of previous assertions and the improvements that were achieved compared to previous methods

    Probabilistic multiple kernel learning

    Get PDF
    The integration of multiple and possibly heterogeneous information sources for an overall decision-making process has been an open and unresolved research direction in computing science since its very beginning. This thesis attempts to address parts of that direction by proposing probabilistic data integration algorithms for multiclass decisions where an observation of interest is assigned to one of many categories based on a plurality of information channels
    • …
    corecore