6 research outputs found

    Making decision trees feasible in ultrahigh feature and label dimensions

    Full text link
    ©2017 Weiwei Liu and Ivor W. Tsang. Due to the non-linear but highly interpretable representations, decision tree (DT) models have significantly attracted a lot of attention of researchers. However, it is difficult to understand and interpret DT models in ultrahigh dimensions and DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree (PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of budget-aware classifier (BAC) with a budget constraint on the weight coefficients, and propose a supervised budgeted tree (SBT) algorithm to achieve non-linear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SBT, we present a pruning strategy by learning classifiers to minimize cross-validation errors on each BAC. To deal with ultrahigh label dimensions, based on three important phenomena of real-world data sets from a variety of application domains, we develop a sparse coding tree framework for multi-label annotation problems and provide the theoretical analysis. Extensive empirical studies verify that 1) SBT is easy to understand and interpret in ultrahigh dimensions and is more resilient to noisy features. 2) Compared with state-of-the-art algorithms, our proposed sparse coding tree framework is more efficient, yet accurate in ultrahigh label and feature dimensions

    G\mathcal{G}-softmax: Improving Intra-class Compactness and Inter-class Separability of Features

    Full text link
    Intra-class compactness and inter-class separability are crucial indicators to measure the effectiveness of a model to produce discriminative features, where intra-class compactness indicates how close the features with the same label are to each other and inter-class separability indicates how far away the features with different labels are. In this work, we investigate intra-class compactness and inter-class separability of features learned by convolutional networks and propose a Gaussian-based softmax (G\mathcal{G}-softmax) function that can effectively improve intra-class compactness and inter-class separability. The proposed function is simple to implement and can easily replace the softmax function. We evaluate the proposed G\mathcal{G}-softmax function on classification datasets (i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet) and on multi-label classification datasets (i.e., MS COCO and NUS-WIDE). The experimental results show that the proposed G\mathcal{G}-softmax function improves the state-of-the-art models across all evaluated datasets. In addition, analysis of the intra-class compactness and inter-class separability demonstrates the advantages of the proposed function over the softmax function, which is consistent with the performance improvement. More importantly, we observe that high intra-class compactness and inter-class separability are linearly correlated to average precision on MS COCO and NUS-WIDE. This implies that improvement of intra-class compactness and inter-class separability would lead to improvement of average precision.Comment: 15 pages, published in TNNL

    The Backbone Method for Ultra-High Dimensional Sparse Machine Learning

    Full text link
    We present the backbone method, a generic framework that enables sparse and interpretable supervised machine learning methods to scale to ultra-high dimensional problems. We solve sparse regression problems with 10710^7 features in minutes and 10810^8 features in hours, as well as decision tree problems with 10510^5 features in minutes.The proposed method operates in two phases: we first determine the backbone set, consisting of potentially relevant features, by solving a number of tractable subproblems; then, we solve a reduced problem, considering only the backbone features. For the sparse regression problem, our theoretical analysis shows that, under certain assumptions and with high probability, the backbone set consists of the truly relevant features. Numerical experiments on both synthetic and real-world datasets demonstrate that our method outperforms or competes with state-of-the-art methods in ultra-high dimensional problems, and competes with optimal solutions in problems where exact methods scale, both in terms of recovering the truly relevant features and in its out-of-sample predictive performance.Comment: First submission to Machine Learning: 06/2020. Revised: 10/202

    The Emerging Trends of Multi-Label Learning

    Full text link
    Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data. For example, extreme multi-label classification is an active and rapidly growing research area that deals with classification tasks with an extremely large number of classes or labels; utilizing massive data with limited supervision to build a multi-label classification model becomes valuable for practical applications, etc. Besides these, there are tremendous efforts on how to harvest the strong learning capability of deep learning to better capture the label dependencies in multi-label learning, which is the key for deep learning to address real-world classification tasks. However, it is noted that there has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data. It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202
    corecore