22 research outputs found
Query Learning with Exponential Query Costs
In query learning, the goal is to identify an unknown object while minimizing
the number of "yes" or "no" questions (queries) posed about that object. A
well-studied algorithm for query learning is known as generalized binary search
(GBS). We show that GBS is a greedy algorithm to optimize the expected number
of queries needed to identify the unknown object. We also generalize GBS in two
ways. First, we consider the case where the cost of querying grows
exponentially in the number of queries and the goal is to minimize the expected
exponential cost. Then, we consider the case where the objects are partitioned
into groups, and the objective is to identify only the group to which the
object belongs. We derive algorithms to address these issues in a common,
information-theoretic framework. In particular, we present an exact formula for
the objective function in each case involving Shannon or Renyi entropy, and
develop a greedy algorithm for minimizing it. Our algorithms are demonstrated
on two applications of query learning, active learning and emergency response.Comment: 15 page
On the Complexity of Searching in Trees: Average-case Minimization
We focus on the average-case analysis: A function w : V -> Z+ is given which
defines the likelihood for a node to be the one marked, and we want the
strategy that minimizes the expected number of queries. Prior to this paper,
very little was known about this natural question and the complexity of the
problem had remained so far an open question.
We close this question and prove that the above tree search problem is
NP-complete even for the class of trees with diameter at most 4. This results
in a complete characterization of the complexity of the problem with respect to
the diameter size. In fact, for diameter not larger than 3 the problem can be
shown to be polynomially solvable using a dynamic programming approach.
In addition we prove that the problem is NP-complete even for the class of
trees of maximum degree at most 16. To the best of our knowledge, the only
known result in this direction is that the tree search problem is solvable in
O(|V| log|V|) time for trees with degree at most 2 (paths).
We match the above complexity results with a tight algorithmic analysis. We
first show that a natural greedy algorithm attains a 2-approximation.
Furthermore, for the bounded degree instances, we show that any optimal
strategy (i.e., one that minimizes the expected number of queries) performs at
most O(\Delta(T) (log |V| + log w(T))) queries in the worst case, where w(T) is
the sum of the likelihoods of the nodes of T and \Delta(T) is the maximum
degree of T. We combine this result with a non-trivial exponential time
algorithm to provide an FPTAS for trees with bounded degree
Compression with graphical constraints: An interactive browser
Abstract—We study the problem of searching for a given element in a set of objects using a membership oracle. The membership oracle, given a subset of objects A, and a target object t, determines whether A contains t or not. The goal is to find the target object with the minimum number of questions asked from the oracle. This problem is known to be strongly related to lossless source compression. In fact, the optimum strategy is provided by Hufmman coding with the average number of questions very close to the entropy H(P) of the object set. The membership oracle aims at modelling interactive methods (i.e., incorporate human feedback) has many real life applica-tions. Due to practical constraints imposed by such applications not every subset A of objects can be queried. It is known that in general finding the optimum strategy with such constrains is NP-complete. Given this negative result we restrict attention to the cases represented by graphical models: graph G whose nodes are the database objects is given, and the queries are restricted to be those subsets A that are connected in G. We show that when G itself is connected, there is a search algorithm that finds the target in 4H(P) + 2 queries on the average. Since entropy is the trivial lower bound, our algorithm performs within a constant gap from the optimum strategy. I