253 research outputs found
Optimal Sparse Decision Trees
Decision tree algorithms have been among the most popular algorithms for
interpretable (transparent) machine learning since the early 1980's. The
problem that has plagued decision tree algorithms since their inception is
their lack of optimality, or lack of guarantees of closeness to optimality:
decision tree algorithms are often greedy or myopic, and sometimes produce
unquestionably suboptimal models. Hardness of decision tree optimization is
both a theoretical and practical obstacle, and even careful mathematical
programming approaches have not been able to solve these problems efficiently.
This work introduces the first practical algorithm for optimal decision trees
for binary variables. The algorithm is a co-design of analytical bounds that
reduce the search space and modern systems techniques, including data
structures and a custom bit-vector library. Our experiments highlight
advantages in scalability, speed, and proof of optimality.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
Introduction in IND and recursive partitioning
This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, and lists the manual pages for the routines and instructions on installation
Value-Function Approximations for Partially Observable Markov Decision Processes
Partially observable Markov decision processes (POMDPs) provide an elegant
mathematical framework for modeling complex decision and planning problems in
stochastic domains in which states of the system are observable only
indirectly, via a set of imperfect or noisy observations. The modeling
advantage of POMDPs, however, comes at a price -- exact methods for solving
them are computationally very expensive and thus applicable in practice only to
very simple problems. We focus on efficient approximation (heuristic) methods
that attempt to alleviate the computational problem and trade off accuracy for
speed. We have two objectives here. First, we survey various approximation
methods, analyze their properties and relations and provide some new insights
into their differences. Second, we present a number of new approximation
methods and novel refinements of existing techniques. The theoretical results
are supported by experiments on a problem from the agent navigation domain
Reducing Nondeterministic Tree Automata by Adding Transitions
We introduce saturation of nondeterministic tree automata, a technique that
consists of adding new transitions to an automaton while preserving its
language. We implemented our algorithm on minotaut - a module of the tree
automata library libvata that reduces the size of automata by merging states
and removing superfluous transitions - and we show how saturation can make
subsequent merge and transition-removal operations more effective. Thus we
obtain a Ptime algorithm that reduces the size of tree automata even more than
before. Additionally, we explore how minotaut alone can play an important role
when performing hard operations like complementation, allowing to both obtain
smaller complement automata and lower computation times. We then show how
saturation can extend this contribution even further. We tested our algorithms
on a large collection of automata from applications of libvata in shape
analysis, and on different classes of randomly generated automata.Comment: In Proceedings MEMICS 2016, arXiv:1612.0403
Planning under time pressure
Heuristic search is a technique used pervasively in artificial intelligence and automated planning. Often an agent is given a task that it would like to solve as quickly as possible. It must allocate its time between planning the actions to achieve the task and actually executing them. We call this problem planning under time pressure. Most popular heuristic search algorithms are ill-suited for this setting, as they either search a lot to find short plans or search a little and find long plans. The thesis of this dissertation is: when under time pressure, an automated agent should explicitly attempt to minimize the sum of planning and execution times, not just one or just the other.
This dissertation makes four contributions. First we present new algorithms that use modern multi-core CPUs to decrease planning time without increasing execution. Second, we introduce a new model for predicting the performance of iterative-deepening search. The model is as accurate as previous offline techniques when using less training data, but can also be used online to reduce the overhead of iterative-deepening search, resulting in faster planning. Third we show offline planning algorithms that directly attempt to minimize the sum of planning and execution times. And, fourth we consider algorithms that plan online in parallel with execution. Both offline and online algorithms account for a user-specified preference between search and execution, and can greatly outperform the standard utility-oblivious techniques. By addressing the problem of planning under time pressure, these contributions demonstrate that heuristic search is no longer restricted to optimizing solution cost, obviating the need to choose between slow search times and expensive solutions
Introduction to IND and recursive partitioning, version 1.0
This manual describes the IND package for learning tree classifiers from data. The package is an integrated C and C shell re-implementation of tree learning routines such as CART, C4, and various MDL and Bayesian variations. The package includes routines for experiment control, interactive operation, and analysis of tree building. The manual introduces the system and its many options, gives a basic review of tree learning, contains a guide to the literature and a glossary, lists the manual pages for the routines, and instructions on installation
- …