22 research outputs found
Hardness and inapproximability results for minimum verification set and minimum path decision tree problems
Minimization of decision trees is a well studied problem. In this work, we introduce two new problems related to minimization of decision trees. The problems are called minimum verification set (MinVS) and minimum path decision tree (MinPathDT) problems. Decision tree problems ask the question "What is the unknown given object?". MinVS problem on the other hand asks the question "Is the unknown object z?", for a given object z. Hence it is not an identification, but rather a verification problem. MinPathDT problem aims to construct a decision tree where only the cost of the root-to-leaf path corresponding to a given object is minimized, whereas decision tree problems in general try to minimize the overall cost of decision trees considering all the
objects. Therefore, MinVS and MinPathDT are seemingly easier problems.
However, in this work we prove that MinVS and MinPathDT problems are both NP-complete and cannot be approximated within a factor in o(lg n) unless P = NP
Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover
Stochastic Boolean Function Evaluation is the problem of determining the
value of a given Boolean function f on an unknown input x, when each bit of x_i
of x can only be determined by paying an associated cost c_i. The assumption is
that x is drawn from a given product distribution, and the goal is to minimize
the expected cost. This problem has been studied in Operations Research, where
it is known as "sequential testing" of Boolean functions. It has also been
studied in learning theory in the context of learning with attribute costs. We
consider the general problem of developing approximation algorithms for
Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for
evaluating Boolean linear threshold formulas. We also present an approximation
algorithm for evaluating CDNF formulas (and decision trees) achieving a factor
of O(log kd), where k is the number of terms in the DNF formula, and d is the
number of clauses in the CNF formula. In addition, we present approximation
algorithms for simultaneous evaluation of linear threshold functions, and for
ranking of linear functions.
Our function evaluation algorithms are based on reductions to the Stochastic
Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and
Krause. They presented an approximation algorithm for the problem, called
Adaptive Greedy. Our main technical contribution is a new approximation
algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an
extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito,
which is a generalization of Hochbaum's algorithm for the classical Set Cover
Problem. We also give a new bound on the approximation achieved by the Adaptive
Greedy algorithm of Golovin and Krause
Efficient Algorithms for Battleship
We consider an algorithmic problem inspired by the Battleship game. In the
variant of the problem that we investigate, there is a unique ship of shape which has been translated in the lattice . We assume that a
player has already hit the ship with a first shot and the goal is to sink the
ship using as few shots as possible, that is, by minimizing the number of
missed shots. While the player knows the shape , which position of has
been hit is not known.
Given a shape of lattice points, the minimum number of misses that
can be achieved in the worst case by any algorithm is called the Battleship
complexity of the shape and denoted . We prove three bounds on
, each considering a different class of shapes. First, we have for arbitrary shapes and the bound is tight for parallelogram-free shapes.
Second, we provide an algorithm that shows that if is an
HV-convex polyomino. Third, we provide an algorithm that shows that if is a digital convex set. This last result is obtained
through a novel discrete version of the Blaschke-Lebesgue inequality relating
the area and the width of any convex body.Comment: Conference version at 10th International Conference on Fun with
Algorithms (FUN 2020
On the Complexity of Searching in Trees: Average-case Minimization
We focus on the average-case analysis: A function w : V -> Z+ is given which
defines the likelihood for a node to be the one marked, and we want the
strategy that minimizes the expected number of queries. Prior to this paper,
very little was known about this natural question and the complexity of the
problem had remained so far an open question.
We close this question and prove that the above tree search problem is
NP-complete even for the class of trees with diameter at most 4. This results
in a complete characterization of the complexity of the problem with respect to
the diameter size. In fact, for diameter not larger than 3 the problem can be
shown to be polynomially solvable using a dynamic programming approach.
In addition we prove that the problem is NP-complete even for the class of
trees of maximum degree at most 16. To the best of our knowledge, the only
known result in this direction is that the tree search problem is solvable in
O(|V| log|V|) time for trees with degree at most 2 (paths).
We match the above complexity results with a tight algorithmic analysis. We
first show that a natural greedy algorithm attains a 2-approximation.
Furthermore, for the bounded degree instances, we show that any optimal
strategy (i.e., one that minimizes the expected number of queries) performs at
most O(\Delta(T) (log |V| + log w(T))) queries in the worst case, where w(T) is
the sum of the likelihoods of the nodes of T and \Delta(T) is the maximum
degree of T. We combine this result with a non-trivial exponential time
algorithm to provide an FPTAS for trees with bounded degree
Harnessing the Power of Choices in Decision Tree Learning
We propose a simple generalization of standard and empirically successful
decision tree learning algorithms such as ID3, C4.5, and CART. These
algorithms, which have been central to machine learning for decades, are greedy
in nature: they grow a decision tree by iteratively splitting on the best
attribute. Our algorithm, Top-, considers the best attributes as
possible splits instead of just the single best attribute. We demonstrate,
theoretically and empirically, the power of this simple generalization. We
first prove a {\sl greediness hierarchy theorem} showing that for every , Top- can be dramatically more powerful than Top-: there
are data distributions for which the former achieves accuracy ,
whereas the latter only achieves accuracy . We then
show, through extensive experiments, that Top- outperforms the two main
approaches to decision tree learning: classic greedy algorithms and more recent
"optimal decision tree" algorithms. On one hand, Top- consistently enjoys
significant accuracy gains over greedy algorithms across a wide range of
benchmarks. On the other hand, Top- is markedly more scalable than optimal
decision tree algorithms and is able to handle dataset and feature set sizes
that remain far beyond the reach of these algorithms.Comment: NeurIPS 202
On the Huffman and Alphabetic Tree Problem with General Cost Functions
We address generalized versions of the Huffman and Alphabetic Tree Problem where the cost caused by each individual leaf i, instead of being linear, depends on its depth in the tree by an arbitrary function. The objective is to minimize either the total cost or the maximum cost among all leaves. We review and extend the known results in this direction and devise a number of new algorithms and hardness proofs. It turns out that the Dynamic Programming approach for the Alphabetic Tree Problem can be extended to arbitrary cost functions, resulting in a time O(n (4)) optimal algorithm using space O(n (3)). We identify classes of cost functions where the well-known trick to reduce the runtime by a factor of n via a "monotonicity" property can be applied. For the generalized Huffman Tree Problem we show that even the k-ary version can be solved by a generalized version of the Coin Collector Algorithm of Larmore and Hirschberg (in Proc. SODA'90, pp. 310-318, 1990) when the cost functions are nondecreasing and convex. Furthermore, we give an O(n (2)logn) algorithm for the worst case minimization variants of both the Huffman and Alphabetic Tree Problem with nondecreasing cost functions. Investigating the limits of computational tractability, we show that the Huffman Tree Problem in its full generality is inapproximable unless P = NP, no matter if the objective function is the sum of leaf costs or their maximum. The alphabetic version becomes NP-hard when the leaf costs are interdependent.ArticleALGORITHMICA. 69(3): 582-604 (2014)journal articl