5,669 research outputs found
Upper Bound Approximations for BlockMaxWand
BlockMaxWand is a recent advance on the Wand dynamic pruning
technique, which allows efficient retrieval without any effectiveness
degradation to rank K. However, while BMW uses docid-sorted indices,
it relies on recording the upper bound of the term weighting
model scores for each block of postings in the inverted index. Such
a requirement can be disadvantageous in situations such as when
an index must be updated. In this work, we examine the appropriateness
of upper-bound approximation – which have previously
been shown suitable for Wand– in providing efficient retrieval for
BMW. Experiments on the ClueWeb12 category B13 corpus using
5000 queries from a real search engine’s query log demonstrate that
BMW still provides benefits w.r.t. Wand when approximate upper
bounds are used, and that, if approximations on upper bounds are
tight, BMW with approximate upper bounds can provide efficiency
gains w.r.t.Wand with exact upper bounds, in particular for queries
of short to medium length
Curse of dimensionality reduction in max-plus based approximation methods: theoretical estimates and improved pruning algorithms
Max-plus based methods have been recently developed to approximate the value
function of possibly high dimensional optimal control problems. A critical step
of these methods consists in approximating a function by a supremum of a small
number of functions (max-plus "basis functions") taken from a prescribed
dictionary. We study several variants of this approximation problem, which we
show to be continuous versions of the facility location and -center
combinatorial optimization problems, in which the connection costs arise from a
Bregman distance. We give theoretical error estimates, quantifying the number
of basis functions needed to reach a prescribed accuracy. We derive from our
approach a refinement of the curse of dimensionality free method introduced
previously by McEneaney, with a higher accuracy for a comparable computational
cost.Comment: 8pages 5 figure
Efficient & Effective Selective Query Rewriting with Efficiency Predictions
To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine
Anytime Point-Based Approximations for Large POMDPs
The Partially Observable Markov Decision Process has long been recognized as
a rich framework for real-world planning and control problems, especially in
robotics. However exact solutions in this framework are typically
computationally intractable for all but the smallest problems. A well-known
technique for speeding up POMDP solving involves performing value backups at
specific belief points, rather than over the entire belief simplex. The
efficiency of this approach, however, depends greatly on the selection of
points. This paper presents a set of novel techniques for selecting informative
belief points which work well in practice. The point selection procedure is
combined with point-based value backups to form an effective anytime POMDP
algorithm called Point-Based Value Iteration (PBVI). The first aim of this
paper is to introduce this algorithm and present a theoretical analysis
justifying the choice of belief selection technique. The second aim of this
paper is to provide a thorough empirical comparison between PBVI and other
state-of-the-art POMDP methods, in particular the Perseus algorithm, in an
effort to highlight their similarities and differences. Evaluation is performed
using both standard POMDP domains and realistic robotic tasks
Center-based Clustering under Perturbation Stability
Clustering under most popular objective functions is NP-hard, even to
approximate well, and so unlikely to be efficiently solvable in the worst case.
Recently, Bilu and Linial \cite{Bilu09} suggested an approach aimed at
bypassing this computational barrier by using properties of instances one might
hope to hold in practice. In particular, they argue that instances in practice
should be stable to small perturbations in the metric space and give an
efficient algorithm for clustering instances of the Max-Cut problem that are
stable to perturbations of size . In addition, they conjecture that
instances stable to as little as O(1) perturbations should be solvable in
polynomial time. In this paper we prove that this conjecture is true for any
center-based clustering objective (such as -median, -means, and
-center). Specifically, we show we can efficiently find the optimal
clustering assuming only stability to factor-3 perturbations of the underlying
metric in spaces without Steiner points, and stability to factor
perturbations for general metrics. In particular, we show for such instances
that the popular Single-Linkage algorithm combined with dynamic programming
will find the optimal clustering. We also present NP-hardness results under a
weaker but related condition
A* Orthogonal Matching Pursuit: Best-First Search for Compressed Sensing Signal Recovery
Compressed sensing is a developing field aiming at reconstruction of sparse
signals acquired in reduced dimensions, which make the recovery process
under-determined. The required solution is the one with minimum norm
due to sparsity, however it is not practical to solve the minimization
problem. Commonly used techniques include minimization, such as Basis
Pursuit (BP) and greedy pursuit algorithms such as Orthogonal Matching Pursuit
(OMP) and Subspace Pursuit (SP). This manuscript proposes a novel semi-greedy
recovery approach, namely A* Orthogonal Matching Pursuit (A*OMP). A*OMP
performs A* search to look for the sparsest solution on a tree whose paths grow
similar to the Orthogonal Matching Pursuit (OMP) algorithm. Paths on the tree
are evaluated according to a cost function, which should compensate for
different path lengths. For this purpose, three different auxiliary structures
are defined, including novel dynamic ones. A*OMP also incorporates pruning
techniques which enable practical applications of the algorithm. Moreover, the
adjustable search parameters provide means for a complexity-accuracy trade-off.
We demonstrate the reconstruction ability of the proposed scheme on both
synthetically generated data and images using Gaussian and Bernoulli
observation matrices, where A*OMP yields less reconstruction error and higher
exact recovery frequency than BP, OMP and SP. Results also indicate that novel
dynamic cost functions provide improved results as compared to a conventional
choice.Comment: accepted for publication in Digital Signal Processin
Online Row Sampling
Finding a small spectral approximation for a tall matrix is
a fundamental numerical primitive. For a number of reasons, one often seeks an
approximation whose rows are sampled from those of . Row sampling improves
interpretability, saves space when is sparse, and preserves row structure,
which is especially important, for example, when represents a graph.
However, correctly sampling rows from can be costly when the matrix is
large and cannot be stored and processed in memory. Hence, a number of recent
publications focus on row sampling in the streaming setting, using little more
space than what is required to store the outputted approximation [KL13,
KLM+14].
Inspired by a growing body of work on online algorithms for machine learning
and data analysis, we extend this work to a more restrictive online setting: we
read rows of one by one and immediately decide whether each row should be
kept in the spectral approximation or discarded, without ever retracting these
decisions. We present an extremely simple algorithm that approximates up to
multiplicative error and additive error using online samples, with memory overhead
proportional to the cost of storing the spectral approximation. We also present
an algorithm that uses ) memory but only requires
samples, which we show is
optimal.
Our methods are clean and intuitive, allow for lower memory usage than prior
work, and expose new theoretical properties of leverage score based matrix
approximation
- …