432,105 research outputs found
Automatic construction of known-item finding test beds
This work is an initial study on the utility of automatically generated queries for evaluating known-item retrieval and how such queries compare to real queries. The main advantage of automatically generating queries is that for any given test collection numerous queries can be produced at minimal cost. For evaluation, this has huge ramifications as state-of-the-art algorithms can be tested on different types of generated queries which mimic particular querying styles that a user may adopt. Our approach draws upon previous research in IR which has probabilistically generated simulated queries for other purposes [2, 3]
Exploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness
In this paper we present a systematic analysis of document
retrieval using unstructured and structured queries within
the score region algebra (SRA) structured retrieval framework. The behavior of diĀ®erent retrieval models, namely
Boolean, tf.idf, GPX, language models, and Okapi, is tested
using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retrieval models are implemented along four elementary retrieval aspects: element and term selection, element score computation, score combination, and score propagation.
The analysis is performed on a numerous experiments
evaluated on TREC and CLEF collections, using manually
generated unstructured and structured queries. Unstructured queries range from the short title queries to long title
+ description + narrative queries. For generating structured
queries we exploit the knowledge of the document structure
and the content used to semantically describe or classify
documents. We show that such structured information can
be utilized in retrieval engines to give more precise answers to user queries then when using unstructured queries
Learning Boolean Halfspaces with Small Weights from Membership Queries
We consider the problem of proper learning a Boolean Halfspace with integer
weights from membership queries only. The best known
algorithm for this problem is an adaptive algorithm that asks
membership queries where the best lower bound for the number of membership
queries is [Learning Threshold Functions with Small Weights Using
Membership Queries. COLT 1999]
In this paper we close this gap and give an adaptive proper learning
algorithm with two rounds that asks membership queries. We also give
a non-adaptive proper learning algorithm that asks membership
queries
Completing Queries: Rewriting of IncompleteWeb Queries under Schema Constraints
Reactive Web systems, Web services, and Web-based publish/
subscribe systems communicate events as XML messages, and in
many cases require composite event detection: it is not sufficient to react
to single event messages, but events have to be considered in relation to
other events that are received over time.
Emphasizing language design and formal semantics, we describe the
rule-based query language XChangeEQ for detecting composite events.
XChangeEQ is designed to completely cover and integrate the four complementary
querying dimensions: event data, event composition, temporal
relationships, and event accumulation. Semantics are provided as
model and fixpoint theories; while this is an established approach for rule
languages, it has not been applied for event queries before
Differential Privacy and the Fat-Shattering Dimension of Linear Queries
In this paper, we consider the task of answering linear queries under the
constraint of differential privacy. This is a general and well-studied class of
queries that captures other commonly studied classes, including predicate
queries and histogram queries. We show that the accuracy to which a set of
linear queries can be answered is closely related to its fat-shattering
dimension, a property that characterizes the learnability of real-valued
functions in the agnostic-learning setting.Comment: Appears in APPROX 201
On Low Treewidth Approximations of Conjunctive Queries
We recently initiated the study of approximations of conjunctive queries within classes that admit tractable query evaluation (with respect to combined complexity). Those include classes of acyclic, bounded treewidth, or bounded hypertreewidth queries. Such approximations are always guaranteed to exist. However, while for acyclic and bounded hypertreewidth queries we have shown a number of examples of interesting approximations, for queries of bounded treewidth the study had been restricted to queries over graphs, where such approximations usually trivialize. In this note we show that for relations of arity greater than two, the notion of low treewidth approximations is a rich one, as many queries possess them. In fact we look at approximations of queries of maximum possible treewidth by queries of minimum possible treewidth (i.e., one), and show that even in this case the structure of approximations remain rather rich as long as input relations are not binary
- ā¦