CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
Making decision trees feasible in ultrahigh feature and label dimensions
Authors
W Liu
IW Tsang
Publication date
1 July 2017
Publisher
Abstract
©2017 Weiwei Liu and Ivor W. Tsang. Due to the non-linear but highly interpretable representations, decision tree (DT) models have significantly attracted a lot of attention of researchers. However, it is difficult to understand and interpret DT models in ultrahigh dimensions and DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree (PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of budget-aware classifier (BAC) with a budget constraint on the weight coefficients, and propose a supervised budgeted tree (SBT) algorithm to achieve non-linear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SBT, we present a pruning strategy by learning classifiers to minimize cross-validation errors on each BAC. To deal with ultrahigh label dimensions, based on three important phenomena of real-world data sets from a variety of application domains, we develop a sparse coding tree framework for multi-label annotation problems and provide the theoretical analysis. Extensive empirical studies verify that 1) SBT is easy to understand and interpret in ultrahigh dimensions and is more resilient to noisy features. 2) Compared with state-of-the-art algorithms, our proposed sparse coding tree framework is more efficient, yet accurate in ultrahigh label and feature dimensions
Similar works
Full text
Available Versions
OPUS - University of Technology Sydney
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:opus.lib.uts.edu.au:10453/...
Last time updated on 18/10/2019