6 research outputs found
Making decision trees feasible in ultrahigh feature and label dimensions
©2017 Weiwei Liu and Ivor W. Tsang. Due to the non-linear but highly interpretable representations, decision tree (DT) models have significantly attracted a lot of attention of researchers. However, it is difficult to understand and interpret DT models in ultrahigh dimensions and DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree (PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of budget-aware classifier (BAC) with a budget constraint on the weight coefficients, and propose a supervised budgeted tree (SBT) algorithm to achieve non-linear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SBT, we present a pruning strategy by learning classifiers to minimize cross-validation errors on each BAC. To deal with ultrahigh label dimensions, based on three important phenomena of real-world data sets from a variety of application domains, we develop a sparse coding tree framework for multi-label annotation problems and provide the theoretical analysis. Extensive empirical studies verify that 1) SBT is easy to understand and interpret in ultrahigh dimensions and is more resilient to noisy features. 2) Compared with state-of-the-art algorithms, our proposed sparse coding tree framework is more efficient, yet accurate in ultrahigh label and feature dimensions
-softmax: Improving Intra-class Compactness and Inter-class Separability of Features
Intra-class compactness and inter-class separability are crucial indicators
to measure the effectiveness of a model to produce discriminative features,
where intra-class compactness indicates how close the features with the same
label are to each other and inter-class separability indicates how far away the
features with different labels are. In this work, we investigate intra-class
compactness and inter-class separability of features learned by convolutional
networks and propose a Gaussian-based softmax (-softmax) function
that can effectively improve intra-class compactness and inter-class
separability. The proposed function is simple to implement and can easily
replace the softmax function. We evaluate the proposed -softmax
function on classification datasets (i.e., CIFAR-10, CIFAR-100, and Tiny
ImageNet) and on multi-label classification datasets (i.e., MS COCO and
NUS-WIDE). The experimental results show that the proposed
-softmax function improves the state-of-the-art models across all
evaluated datasets. In addition, analysis of the intra-class compactness and
inter-class separability demonstrates the advantages of the proposed function
over the softmax function, which is consistent with the performance
improvement. More importantly, we observe that high intra-class compactness and
inter-class separability are linearly correlated to average precision on MS
COCO and NUS-WIDE. This implies that improvement of intra-class compactness and
inter-class separability would lead to improvement of average precision.Comment: 15 pages, published in TNNL
The Backbone Method for Ultra-High Dimensional Sparse Machine Learning
We present the backbone method, a generic framework that enables sparse and
interpretable supervised machine learning methods to scale to ultra-high
dimensional problems. We solve sparse regression problems with features
in minutes and features in hours, as well as decision tree problems with
features in minutes.The proposed method operates in two phases: we first
determine the backbone set, consisting of potentially relevant features, by
solving a number of tractable subproblems; then, we solve a reduced problem,
considering only the backbone features. For the sparse regression problem, our
theoretical analysis shows that, under certain assumptions and with high
probability, the backbone set consists of the truly relevant features.
Numerical experiments on both synthetic and real-world datasets demonstrate
that our method outperforms or competes with state-of-the-art methods in
ultra-high dimensional problems, and competes with optimal solutions in
problems where exact methods scale, both in terms of recovering the truly
relevant features and in its out-of-sample predictive performance.Comment: First submission to Machine Learning: 06/2020. Revised: 10/202
The Emerging Trends of Multi-Label Learning
Exabytes of data are generated daily by humans, leading to the growing need
for new efforts in dealing with the grand challenges for multi-label learning
brought by big data. For example, extreme multi-label classification is an
active and rapidly growing research area that deals with classification tasks
with an extremely large number of classes or labels; utilizing massive data
with limited supervision to build a multi-label classification model becomes
valuable for practical applications, etc. Besides these, there are tremendous
efforts on how to harvest the strong learning capability of deep learning to
better capture the label dependencies in multi-label learning, which is the key
for deep learning to address real-world classification tasks. However, it is
noted that there has been a lack of systemic studies that focus explicitly on
analyzing the emerging trends and new challenges of multi-label learning in the
era of big data. It is imperative to call for a comprehensive survey to fulfill
this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202