6,420 research outputs found
Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations
Consider the following heuristic for building a decision tree for a function
. Place the most influential variable of
at the root, and recurse on the subfunctions and on the
left and right subtrees respectively; terminate once the tree is an
-approximation of . We analyze the quality of this heuristic,
obtaining near-matching upper and lower bounds:
Upper bound: For every with decision tree size and every
, this heuristic builds a decision tree of size
at most .
Lower bound: For every and , there is an with decision tree size such that
this heuristic builds a decision tree of size .
We also obtain upper and lower bounds for monotone functions:
and
respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004)
and Lee (2009).
Our upper bounds yield new algorithms for properly learning decision trees
under the uniform distribution. We show that these algorithms---which are
motivated by widely employed and empirically successful top-down decision tree
learning heuristics such as ID3, C4.5, and CART---achieve provable guarantees
that compare favorably with those of the current fastest algorithm (Ehrenfeucht
and Haussler, 1989). Our lower bounds shed new light on the limitations of
these heuristics.
Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend
it to give the first uniform-distribution proper learning algorithm that
achieves polynomial sample and memory complexity, while matching its
state-of-the-art quasipolynomial runtime
Recommended from our members
Cancer Niche as a Garbage Disposal Machine: Implications of TCM-Mediated Balance of Body-Disease for Treatment of Cancer.
Cancer epidemic led to worldwide to search for a new "game changer" concept to govern cancer research and cancer treatment. Western medicine-based cancer research has been extending the impasse without resolution in sigh for improving survival of patients with solid malignant tumors in the last four decades due to heterogeneity in cancer tissues. Such a deadlock charts a course to learn lessons from the developing countries, directly or indirectly to complement the exhausted Western medicine. We propose a new concept of "Cancer niche as a garbage disposal machine" with implications of traditional Chinese medicine-mediated restoration of normal balance between body and disease to bring the fight against cancer under control
Recommended from our members
Worldwide genetic variation of the IGHV and TRBV immune receptor gene families in humans.
The immunoglobulin heavy variable (IGHV) and T cell beta variable (TRBV) loci are among the most complex and variable regions in the human genome. Generated through a process of gene duplication/deletion and diversification, these loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Here, we present a comprehensive study of the functional gene segments in the IGHV and TRBV loci, quantifying their copy number and single-nucleotide variation in a globally diverse sample of 109 (IGHV) and 286 (TRBV) humans from over a 100 populations. We find that the IGHV and TRBV gene families exhibit starkly different patterns of variation. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines
ISBDD model for classification of hyperspectral remote sensing imagery
The diverse density (DD) algorithm was proposed to handle the problem of low classification accuracy when training samples contain interference such as mixed pixels. The DD algorithm can learn a feature vector from training bags, which comprise instances (pixels). However, the feature vector learned by the DD algorithm cannot always effectively represent one type of ground cover. To handle this problem, an instance space-based diverse density (ISBDD) model that employs a novel training strategy is proposed in this paper. In the ISBDD model, DD values of each pixel are computed instead of learning a feature vector, and as a result, the pixel can be classified according to its DD values. Airborne hyperspectral data collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor and the Push-broom Hyperspectral Imager (PHI) are applied to evaluate the performance of the proposed model. Results show that the overall classification accuracy of ISBDD model on the AVIRIS and PHI images is up to 97.65% and 89.02%, respectively, while the kappa coefficient is up to 0.97 and 0.88, respectively
Value Added of Teachers in High-Poverty Schools and Lower-Poverty Schools
This paper examines whether teachers in schools serving students from high-poverty backgrounds are as effective as teachers in schools with more advantaged students. The question is important. Teachers are recognized as the most important school factor affecting student achievement, and the achievement gap between disadvantaged students and their better off peers is large and persistent. Using student-level microdata from 2000-2001 to 2004-2005 from Florida and North Carolina, the authors compare the effectiveness of teachers in high-poverty elementary schools (>70% FRL students) with that of teachers in lower-poverty elementary schools
- …