22 research outputs found

    Hardness and inapproximability results for minimum verification set and minimum path decision tree problems

    Get PDF
    Minimization of decision trees is a well studied problem. In this work, we introduce two new problems related to minimization of decision trees. The problems are called minimum verification set (MinVS) and minimum path decision tree (MinPathDT) problems. Decision tree problems ask the question "What is the unknown given object?". MinVS problem on the other hand asks the question "Is the unknown object z?", for a given object z. Hence it is not an identification, but rather a verification problem. MinPathDT problem aims to construct a decision tree where only the cost of the root-to-leaf path corresponding to a given object is minimized, whereas decision tree problems in general try to minimize the overall cost of decision trees considering all the objects. Therefore, MinVS and MinPathDT are seemingly easier problems. However, in this work we prove that MinVS and MinPathDT problems are both NP-complete and cannot be approximated within a factor in o(lg n) unless P = NP

    Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover

    Full text link
    Stochastic Boolean Function Evaluation is the problem of determining the value of a given Boolean function f on an unknown input x, when each bit of x_i of x can only be determined by paying an associated cost c_i. The assumption is that x is drawn from a given product distribution, and the goal is to minimize the expected cost. This problem has been studied in Operations Research, where it is known as "sequential testing" of Boolean functions. It has also been studied in learning theory in the context of learning with attribute costs. We consider the general problem of developing approximation algorithms for Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for evaluating Boolean linear threshold formulas. We also present an approximation algorithm for evaluating CDNF formulas (and decision trees) achieving a factor of O(log kd), where k is the number of terms in the DNF formula, and d is the number of clauses in the CNF formula. In addition, we present approximation algorithms for simultaneous evaluation of linear threshold functions, and for ranking of linear functions. Our function evaluation algorithms are based on reductions to the Stochastic Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and Krause. They presented an approximation algorithm for the problem, called Adaptive Greedy. Our main technical contribution is a new approximation algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito, which is a generalization of Hochbaum's algorithm for the classical Set Cover Problem. We also give a new bound on the approximation achieved by the Adaptive Greedy algorithm of Golovin and Krause

    Efficient Algorithms for Battleship

    Get PDF
    We consider an algorithmic problem inspired by the Battleship game. In the variant of the problem that we investigate, there is a unique ship of shape SZ2S \subset Z^2 which has been translated in the lattice Z2Z^2. We assume that a player has already hit the ship with a first shot and the goal is to sink the ship using as few shots as possible, that is, by minimizing the number of missed shots. While the player knows the shape SS, which position of SS has been hit is not known. Given a shape SS of nn lattice points, the minimum number of misses that can be achieved in the worst case by any algorithm is called the Battleship complexity of the shape SS and denoted c(S)c(S). We prove three bounds on c(S)c(S), each considering a different class of shapes. First, we have c(S)n1c(S) \leq n-1 for arbitrary shapes and the bound is tight for parallelogram-free shapes. Second, we provide an algorithm that shows that c(S)=O(logn)c(S) = O(\log n) if SS is an HV-convex polyomino. Third, we provide an algorithm that shows that c(S)=O(loglogn)c(S) = O(\log \log n) if SS is a digital convex set. This last result is obtained through a novel discrete version of the Blaschke-Lebesgue inequality relating the area and the width of any convex body.Comment: Conference version at 10th International Conference on Fun with Algorithms (FUN 2020

    On the Complexity of Searching in Trees: Average-case Minimization

    Full text link
    We focus on the average-case analysis: A function w : V -> Z+ is given which defines the likelihood for a node to be the one marked, and we want the strategy that minimizes the expected number of queries. Prior to this paper, very little was known about this natural question and the complexity of the problem had remained so far an open question. We close this question and prove that the above tree search problem is NP-complete even for the class of trees with diameter at most 4. This results in a complete characterization of the complexity of the problem with respect to the diameter size. In fact, for diameter not larger than 3 the problem can be shown to be polynomially solvable using a dynamic programming approach. In addition we prove that the problem is NP-complete even for the class of trees of maximum degree at most 16. To the best of our knowledge, the only known result in this direction is that the tree search problem is solvable in O(|V| log|V|) time for trees with degree at most 2 (paths). We match the above complexity results with a tight algorithmic analysis. We first show that a natural greedy algorithm attains a 2-approximation. Furthermore, for the bounded degree instances, we show that any optimal strategy (i.e., one that minimizes the expected number of queries) performs at most O(\Delta(T) (log |V| + log w(T))) queries in the worst case, where w(T) is the sum of the likelihoods of the nodes of T and \Delta(T) is the maximum degree of T. We combine this result with a non-trivial exponential time algorithm to provide an FPTAS for trees with bounded degree

    Harnessing the Power of Choices in Decision Tree Learning

    Full text link
    We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top-kk, considers the kk best attributes as possible splits instead of just the single best attribute. We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a {\sl greediness hierarchy theorem} showing that for every kNk \in \mathbb{N}, Top-(k+1)(k+1) can be dramatically more powerful than Top-kk: there are data distributions for which the former achieves accuracy 1ε1-\varepsilon, whereas the latter only achieves accuracy 12+ε\frac1{2}+\varepsilon. We then show, through extensive experiments, that Top-kk outperforms the two main approaches to decision tree learning: classic greedy algorithms and more recent "optimal decision tree" algorithms. On one hand, Top-kk consistently enjoys significant accuracy gains over greedy algorithms across a wide range of benchmarks. On the other hand, Top-kk is markedly more scalable than optimal decision tree algorithms and is able to handle dataset and feature set sizes that remain far beyond the reach of these algorithms.Comment: NeurIPS 202

    On the Huffman and Alphabetic Tree Problem with General Cost Functions

    Get PDF
    We address generalized versions of the Huffman and Alphabetic Tree Problem where the cost caused by each individual leaf i, instead of being linear, depends on its depth in the tree by an arbitrary function. The objective is to minimize either the total cost or the maximum cost among all leaves. We review and extend the known results in this direction and devise a number of new algorithms and hardness proofs. It turns out that the Dynamic Programming approach for the Alphabetic Tree Problem can be extended to arbitrary cost functions, resulting in a time O(n (4)) optimal algorithm using space O(n (3)). We identify classes of cost functions where the well-known trick to reduce the runtime by a factor of n via a "monotonicity" property can be applied. For the generalized Huffman Tree Problem we show that even the k-ary version can be solved by a generalized version of the Coin Collector Algorithm of Larmore and Hirschberg (in Proc. SODA'90, pp. 310-318, 1990) when the cost functions are nondecreasing and convex. Furthermore, we give an O(n (2)logn) algorithm for the worst case minimization variants of both the Huffman and Alphabetic Tree Problem with nondecreasing cost functions. Investigating the limits of computational tractability, we show that the Huffman Tree Problem in its full generality is inapproximable unless P = NP, no matter if the objective function is the sum of leaf costs or their maximum. The alphabetic version becomes NP-hard when the leaf costs are interdependent.ArticleALGORITHMICA. 69(3): 582-604 (2014)journal articl
    corecore