139,221 research outputs found

    Multi-Label Super Learner: Multi-Label Classification and Improving Its Performance Using Heterogenous Ensemble Methods

    Get PDF
    Classification is the task of predicting the label(s) of future instances by learning and inferring from the patterns of instances with known labels. Traditional classification methods focus on single-label classification; however, many real-life problems require multi-label classification that classifies each instance into multiple categories. For example, in sentiment analysis, a person may feel multiple emotions at the same time; in bioinformatics, a gene or protein may have a number of functional expressions; in text categorization, an email, medical record, or social media posting can be identified by various tags simultaneously. As a result of such wide a range of applications, in recent years, multi-label classification has become an emerging research area. There are two general approaches to realize multi-label classification: problem transformation and algorithm adaption. The problem transformation methodology, at its core, converts a multi-label dataset into several single-label datasets, thereby allowing the transformed datasets to be modeled using existing binary or multi-class classification methods. On the other hand, the algorithm adaption methodology transforms single-label classification algorithms in order to be applied to original multi-label datasets. This thesis proposes a new method, called Multi-Label Super Leaner (MLSL), which is a stacking-based heterogeneous ensemble method. An improved multi-label classification algorithm following the problem transformation approach, MLSL combines the prediction power of several multi-label classification methods through an ensemble algorithm, super learner. The performance of this new method is compared to existing problem transformation algorithms, and our numerical results show that MLSL outperforms existing algorithms for almost all of the performance metrics

    Information gain feature selection for multi-label classification.

    Get PDF
    In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research in multi-label classification. And, more specifically, many feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. However, most methods proposed for this task rely on the transformation of the multi-label data set into a single-label one. In this work we have chosen one of the most wellknown measures for feature selection ? Information Gain ? and we have evaluated it along with common transformation techniques for the multi-label classification. We have also adapted the information gain feature selection technique to handle multi-label data directly. Our goal is to perform a thorough investigation of the performance of multi-label feature selection techniques using the information gain concept and report how it varies when coupled with different multi-label classifiers and data sets from different domains

    LAIM discretization for multi-label data

    Get PDF
    Multi-label learning is a challenging task in data mining which has attracted growing attention in recent years. Despite the fact that many multi-label datasets have continuous features, general algorithms developed specially to transform multi-label datasets with continuous attributes’ values into a finite number of intervals have not been proposed to date. Many classification algorithms require discrete values as the input and studies have shown that supervised discretization may improve classification performance. This paper presents a Label-Attribute Interdependence Maximization (LAIM) discretization method for multi-label data. LAIM is inspired in the discretization heuristic of CAIM for single-label classification. The maximization of the label-attribute interdependence is expected to improve labels prediction in data separated through disjoint intervals. The main aim of this paper is to present a discretization method specifically designed to deal with multi-label data and to analyze whether this can improve the performance of multi-label learning methods. To this end, the experimental analysis evaluates the performance of 12 multi-label learning algorithms (transformation, adaptation, and ensemble-based) on a series of 16 multi-label datasets with and without supervised and unsupervised discretization, showing that LAIM discretization improves the performance for many algorithms and measures

    Affine Registration of label maps in Label Space

    Get PDF
    Two key aspects of coupled multi-object shape\ud analysis and atlas generation are the choice of representation\ud and subsequent registration methods used to align the sample\ud set. For example, a typical brain image can be labeled into\ud three structures: grey matter, white matter and cerebrospinal\ud fluid. Many manipulations such as interpolation, transformation,\ud smoothing, or registration need to be performed on these images\ud before they can be used in further analysis. Current techniques\ud for such analysis tend to trade off performance between the two\ud tasks, performing well for one task but developing problems when\ud used for the other.\ud This article proposes to use a representation that is both\ud flexible and well suited for both tasks. We propose to map object\ud labels to vertices of a regular simplex, e.g. the unit interval for\ud two labels, a triangle for three labels, a tetrahedron for four\ud labels, etc. This representation, which is routinely used in fuzzy\ud classification, is ideally suited for representing and registering\ud multiple shapes. On closer examination, this representation\ud reveals several desirable properties: algebraic operations may\ud be done directly, label uncertainty is expressed as a weighted\ud mixture of labels (probabilistic interpretation), interpolation is\ud unbiased toward any label or the background, and registration\ud may be performed directly.\ud We demonstrate these properties by using label space in a gradient\ud descent based registration scheme to obtain a probabilistic\ud atlas. While straightforward, this iterative method is very slow,\ud could get stuck in local minima, and depends heavily on the initial\ud conditions. To address these issues, two fast methods are proposed\ud which serve as coarse registration schemes following which the\ud iterative descent method can be used to refine the results. Further,\ud we derive an analytical formulation for direct computation of the\ud "group mean" from the parameters of pairwise registration of all\ud the images in the sample set. We show results on richly labeled\ud 2D and 3D data sets

    Multilabel Classification with R Package mlr

    Full text link
    We implemented several multilabel classification algorithms in the machine learning package mlr. The implemented methods are binary relevance, classifier chains, nested stacking, dependent binary relevance and stacking, which can be used with any base learner that is accessible in mlr. Moreover, there is access to the multilabel classification versions of randomForestSRC and rFerns. All these methods can be easily compared by different implemented multilabel performance measures and resampling methods in the standardized mlr framework. In a benchmark experiment with several multilabel datasets, the performance of the different methods is evaluated.Comment: 18 pages, 2 figures, to be published in R Journal; reference correcte
    • …
    corecore