146,896 research outputs found
Studies of Boosted Decision Trees for MiniBooNE Particle Identification
Boosted decision trees are applied to particle identification in the
MiniBooNE experiment operated at Fermi National Accelerator Laboratory
(Fermilab) for neutrino oscillations. Numerous attempts are made to tune the
boosted decision trees, to compare performance of various boosting algorithms,
and to select input variables for optimal performance.Comment: 28 pages, 22 figures, submitted to Nucl. Inst & Meth.
Separating decision tree complexity from subcube partition complexity
The subcube partition model of computation is at least as powerful as
decision trees but no separation between these models was known. We show that
there exists a function whose deterministic subcube partition complexity is
asymptotically smaller than its randomized decision tree complexity, resolving
an open problem of Friedgut, Kahn, and Wigderson (2002). Our lower bound is
based on the information-theoretic techniques first introduced to lower bound
the randomized decision tree complexity of the recursive majority function.
We also show that the public-coin partition bound, the best known lower bound
method for randomized decision tree complexity subsuming other general
techniques such as block sensitivity, approximate degree, randomized
certificate complexity, and the classical adversary bound, also lower bounds
randomized subcube partition complexity. This shows that all these lower bound
techniques cannot prove optimal lower bounds for randomized decision tree
complexity, which answers an open question of Jain and Klauck (2010) and Jain,
Lee, and Vishnoi (2014).Comment: 16 pages, 1 figur
Inferring ancestral sequences in taxon-rich phylogenies
Statistical consistency in phylogenetics has traditionally referred to the
accuracy of estimating phylogenetic parameters for a fixed number of species as
we increase the number of characters. However, as sequences are often of fixed
length (e.g. for a gene) although we are often able to sample more taxa, it is
useful to consider a dual type of statistical consistency where we increase the
number of species, rather than characters. This raises some basic questions:
what can we learn about the evolutionary process as we increase the number of
species? In particular, does having more species allow us to infer the
ancestral state of characters accurately? This question is particularly
relevant when sequence site evolution varies in a complex way from character to
character, as well as for reconstructing ancestral sequences. In this paper, we
assemble a collection of results to analyse various approaches for inferring
ancestral information with increasing accuracy as the number of taxa increases.Comment: 32 pages, 5 figures, 1 table
Optimal Sparse Decision Trees
Decision tree algorithms have been among the most popular algorithms for
interpretable (transparent) machine learning since the early 1980's. The
problem that has plagued decision tree algorithms since their inception is
their lack of optimality, or lack of guarantees of closeness to optimality:
decision tree algorithms are often greedy or myopic, and sometimes produce
unquestionably suboptimal models. Hardness of decision tree optimization is
both a theoretical and practical obstacle, and even careful mathematical
programming approaches have not been able to solve these problems efficiently.
This work introduces the first practical algorithm for optimal decision trees
for binary variables. The algorithm is a co-design of analytical bounds that
reduce the search space and modern systems techniques, including data
structures and a custom bit-vector library. Our experiments highlight
advantages in scalability, speed, and proof of optimality.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
- …