16,398 research outputs found
Document Retrieval on Repetitive Collections
Document retrieval aims at finding the most important documents where a
pattern appears in a collection of strings. Traditional pattern-matching
techniques yield brute-force document retrieval solutions, which has motivated
the research on tailored indexes that offer near-optimal performance. However,
an experimental study establishing which alternatives are actually better than
brute force, and which perform best depending on the collection
characteristics, has not been carried out. In this paper we address this
shortcoming by exploring the relationship between the nature of the underlying
collection and the performance of current methods. Via extensive experiments we
show that established solutions are often beaten in practice by brute-force
alternatives. We also design new methods that offer superior time/space
trade-offs, particularly on repetitive collections.Comment: Accepted to ESA 2014. Implementation and experiments at
http://www.cs.helsinki.fi/group/suds/rlcsa
Amorphous Placement and Retrieval of Sensory Data in Sparse Mobile Ad-Hoc Networks
Abstract—Personal communication devices are increasingly being equipped with sensors that are able to passively collect information from their surroundings – information that could be stored in fairly small local caches. We envision a system in which users of such devices use their collective sensing, storage, and communication resources to query the state of (possibly remote) neighborhoods. The goal of such a system is to achieve the highest query success ratio using the least communication overhead (power). We show that the use of Data Centric Storage (DCS), or directed placement, is a viable approach for achieving this goal, but only when the underlying network is well connected. Alternatively, we propose, amorphous placement, in which sensory samples are cached locally and informed exchanges of cached samples is used to diffuse the sensory data throughout the whole network. In handling queries, the local cache is searched first for potential answers. If unsuccessful, the query is forwarded to one or more direct neighbors for answers. This technique leverages node mobility and caching capabilities to avoid the multi-hop communication overhead of directed placement. Using a simplified mobility model, we provide analytical lower and upper bounds on the ability of amorphous placement to achieve uniform field coverage in one and two dimensions. We show that combining informed shuffling of cached samples upon an encounter between two nodes, with the querying of direct neighbors could lead to significant performance improvements. For instance, under realistic mobility models, our simulation experiments show that amorphous placement achieves 10% to 40% better query answering ratio at a 25% to 35% savings in consumed power over directed placement.National Science Foundation (CNS Cybertrust 0524477, CNS NeTS 0520166, CNS ITR 0205294, EIA RI 0202067
Stochastic Query Covering for Fast Approximate Document Retrieval
We design algorithms that, given a collection of documents and a distribution over user queries, return a
small subset of the document collection in such a way that we can efficiently provide high-quality answers
to user queries using only the selected subset. This approach has applications when space is a constraint
or when the query-processing time increases significantly with the size of the collection. We study our
algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction
of the entire collection, they can provide answers to most user queries, achieving a performance close to the
optimal. To complement our theoretical findings, we experimentally show the versatility of our approach
by considering two important cases in the context of Web search. In the first case, we favor the retrieval of
documents that are relevant to the query, whereas in the second case we aim for document diversification.
Both the theoretical and the experimental analysis provide strong evidence of the potential value of query
covering in diverse application scenarios
Query processing of spatial objects: Complexity versus Redundancy
The management of complex spatial objects in applications, such as geography and cartography,
imposes stringent new requirements on spatial database systems, in particular on efficient
query processing. As shown before, the performance of spatial query processing can be improved
by decomposing complex spatial objects into simple components. Up to now, only decomposition
techniques generating a linear number of very simple components, e.g. triangles or trapezoids, have
been considered. In this paper, we will investigate the natural trade-off between the complexity of
the components and the redundancy, i.e. the number of components, with respect to its effect on
efficient query processing. In particular, we present two new decomposition methods generating
a better balance between the complexity and the number of components than previously known
techniques. We compare these new decomposition methods to the traditional undecomposed representation
as well as to the well-known decomposition into convex polygons with respect to their
performance in spatial query processing. This comparison points out that for a wide range of query
selectivity the new decomposition techniques clearly outperform both the undecomposed representation
and the convex decomposition method. More important than the absolute gain in performance
by a factor of up to an order of magnitude is the robust performance of our new decomposition
techniques over the whole range of query selectivity
Learning a Policy for Opportunistic Active Learning
Active learning identifies data points to label that are expected to be the
most useful in improving a supervised model. Opportunistic active learning
incorporates active learning into interactive tasks that constrain possible
queries during interactions. Prior work has shown that opportunistic active
learning can be used to improve grounding of natural language descriptions in
an interactive object retrieval task. In this work, we use reinforcement
learning for such an object retrieval task, to learn a policy that effectively
trades off task completion with model improvement that would benefit future
tasks.Comment: EMNLP 2018 Camera Read
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
- …