Search CORE

7,724 research outputs found

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

Author: Blanc Guy
Lange Jane
Tan Li-Yang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 17/11/2019
Field of study

Consider the following heuristic for building a decision tree for a function

f : \{0,1\}^n \to \{\pm 1\}

. Place the most influential variable

x_i

f

at the root, and recurse on the subfunctions

f_{x_i=0}

and

f_{x_i=1}

on the left and right subtrees respectively; terminate once the tree is an

\varepsilon

-approximation of

f

. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds:

\circ

Upper bound: For every

f

with decision tree size

s

and every

\varepsilon \in (0,\frac1{2})

, this heuristic builds a decision tree of size at most

s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}

\circ

Lower bound: For every

\varepsilon \in (0,\frac1{2})

and

s \le 2^{\tilde{O}(\sqrt{n})}

, there is an

f

with decision tree size

s

such that this heuristic builds a decision tree of size

s^{\tilde{\Omega}(\log s)}

. We also obtain upper and lower bounds for monotone functions:

s^{O(\sqrt{\log s}/\varepsilon)}

and

s^{\tilde{\Omega}(\sqrt[4]{\log s } )}

respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms---which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Recommended from our members

Cancer Niche as a Garbage Disposal Machine: Implications of TCM-Mediated Balance of Body-Disease for Treatment of Cancer.

Author: Lee Katherine L
Li Shengwen Calvin
Luo Jane
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Cancer epidemic led to worldwide to search for a new "game changer" concept to govern cancer research and cancer treatment. Western medicine-based cancer research has been extending the impasse without resolution in sigh for improving survival of patients with solid malignant tumors in the last four decades due to heterogeneity in cancer tissues. Such a deadlock charts a course to learn lessons from the developing countries, directly or indirectly to complement the exhausted Western medicine. We propose a new concept of "Cancer niche as a garbage disposal machine" with implications of traditional Chinese medicine-mediated restoration of normal balance between body and disease to bring the fight against cancer under control

eScholarship - University of California

Recommended from our members

Worldwide genetic variation of the IGHV and TRBV immune receptor gene families in humans.

Author: Li Heng
Luo Shishi
Song Yun
Yu Jane
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

The immunoglobulin heavy variable (IGHV) and T cell beta variable (TRBV) loci are among the most complex and variable regions in the human genome. Generated through a process of gene duplication/deletion and diversification, these loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Here, we present a comprehensive study of the functional gene segments in the IGHV and TRBV loci, quantifying their copy number and single-nucleotide variation in a globally diverse sample of 109 (IGHV) and 286 (TRBV) humans from over a 100 populations. We find that the IGHV and TRBV gene families exhibit starkly different patterns of variation. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines

eScholarship - University of California

ISBDD model for classification of hyperspectral remote sensing imagery

Author: Drummond Jane
Huang Xinchen
Li Na
Li Zhenhong
Wang Daming
Xu Zhaopeng
Zhao Huijie
Publication venue: 'MDPI AG'
Publication date: 01/03/2018
Field of study

The diverse density (DD) algorithm was proposed to handle the problem of low classification accuracy when training samples contain interference such as mixed pixels. The DD algorithm can learn a feature vector from training bags, which comprise instances (pixels). However, the feature vector learned by the DD algorithm cannot always effectively represent one type of ground cover. To handle this problem, an instance space-based diverse density (ISBDD) model that employs a novel training strategy is proposed in this paper. In the ISBDD model, DD values of each pixel are computed instead of learning a feature vector, and as a result, the pixel can be classified according to its DD values. Airborne hyperspectral data collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor and the Push-broom Hyperspectral Imager (PHI) are applied to evaluate the performance of the proposed model. Results show that the overall classification accuracy of ISBDD model on the AVIRIS and PHI images is up to 97.65% and 89.02%, respectively, while the kappa coefficient is up to 0.97 and 0.88, respectively

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

Enlighten

Value Added of Teachers in High-Poverty Schools and Lower-Poverty Schools

Author: David Figlio
Jane Hannaway
Li Feng
Tim Sass
Zeyu Xu
Publication venue: National Center for Analysis of Longitudinal Data in Education Research (CALDER)
Publication date: 11/11/2010
Field of study

This paper examines whether teachers in schools serving students from high-poverty backgrounds are as effective as teachers in schools with more advantaged students. The question is important. Teachers are recognized as the most important school factor affecting student achievement, and the achievement gap between disadvantaged students and their better off peers is large and persistent. Using student-level microdata from 2000-2001 to 2004-2005 from Florida and North Carolina, the authors compare the effectiveness of teachers in high-poverty elementary schools (>70% FRL students) with that of teachers in lower-poverty elementary schools

IssueLab