4,459 research outputs found
Provenance Circuits for Trees and Treelike Instances (Extended Version)
Query evaluation in monadic second-order logic (MSO) is tractable on trees
and treelike instances, even though it is hard for arbitrary instances. This
tractability result has been extended to several tasks related to query
evaluation, such as counting query results [3] or performing query evaluation
on probabilistic trees [10]. These are two examples of the more general problem
of computing augmented query output, that is referred to as provenance. This
article presents a provenance framework for trees and treelike instances, by
describing a linear-time construction of a circuit provenance representation
for MSO queries. We show how this provenance can be connected to the usual
definitions of semiring provenance on relational instances [20], even though we
compute it in an unusual way, using tree automata; we do so via intrinsic
definitions of provenance for general semirings, independent of the operational
details of query evaluation. We show applications of this provenance to capture
existing counting and probabilistic results on trees and treelike instances,
and give novel consequences for probability evaluation.Comment: 48 pages. Presented at ICALP'1
Compact Labelings For Efficient First-Order Model-Checking
We consider graph properties that can be checked from labels, i.e., bit
sequences, of logarithmic length attached to vertices. We prove that there
exists such a labeling for checking a first-order formula with free set
variables in the graphs of every class that is \emph{nicely locally
cwd-decomposable}. This notion generalizes that of a \emph{nicely locally
tree-decomposable} class. The graphs of such classes can be covered by graphs
of bounded \emph{clique-width} with limited overlaps. We also consider such
labelings for \emph{bounded} first-order formulas on graph classes of
\emph{bounded expansion}. Some of these results are extended to counting
queries
Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm
Document representation (indexing) techniques are dominated by variants of the term-frequency analysis approach, based on the assumption that the more occurrences a term has throughout a document the more important the term is in that document. Inherent drawbacks associated with this approach include: poor index quality, high document representation size and the word mismatch problem. To tackle these drawbacks, a document representation improvement method called the Relevance Feedback Accumulation (RFA) algorithm is presented. The algorithm provides a mechanism to continuously accumulate relevance assessments over time and across users. It also provides a document representation modification function, or document representation learning function that gradually improves the quality of the document representations. To improve document representations, the learning function uses a data mining measure called support for analyzing the accumulated relevance feedback.
Evaluation is done by comparing the RFA algorithm to other four algorithms. The four measures used for evaluation are (a) average number of index terms per document; (b) the quality of the document representations assessed by human judges; (c) retrieval effectiveness; and (d) the quality of the document representation learning function. The evaluation results show that (1) the algorithm is able to substantially reduce the document representations size while maintaining retrieval effectiveness parameters; (2) the algorithm provides a smooth and steady document representation learning function; and (3) the algorithm improves the quality of the document representations. The RFA algorithm\u27s approach is consistent with efficiency considerations that hold in real information retrieval systems.
The major contribution made by this research is the design and implementation of a novel, simple, efficient, and scalable technique for document representation improvement
Querying the Guarded Fragment
Evaluating a Boolean conjunctive query Q against a guarded first-order theory
F is equivalent to checking whether "F and not Q" is unsatisfiable. This
problem is relevant to the areas of database theory and description logic.
Since Q may not be guarded, well known results about the decidability,
complexity, and finite-model property of the guarded fragment do not obviously
carry over to conjunctive query answering over guarded theories, and had been
left open in general. By investigating finite guarded bisimilar covers of
hypergraphs and relational structures, and by substantially generalising
Rosati's finite chase, we prove for guarded theories F and (unions of)
conjunctive queries Q that (i) Q is true in each model of F iff Q is true in
each finite model of F and (ii) determining whether F implies Q is
2EXPTIME-complete. We further show the following results: (iii) the existence
of polynomial-size conformal covers of arbitrary hypergraphs; (iv) a new proof
of the finite model property of the clique-guarded fragment; (v) the small
model property of the guarded fragment with optimal bounds; (vi) a
polynomial-time solution to the canonisation problem modulo guarded
bisimulation, which yields (vii) a capturing result for guarded bisimulation
invariant PTIME.Comment: This is an improved and extended version of the paper of the same
title presented at LICS 201
Twenty-One at TREC-8: using Language Technology for Information Retrieval
This paper describes the official runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR tasks we experimented with automatic sense disambiguation followed by query expansion or translation. We used a combination of thesaurial and corpus information for the disambiguation process. We continued research on CLIR techniques which exploit the target corpus for an implicit disambiguation, by importing the translation probabilities into the probabilistic term-weighting framework. In filtering we extended the use of language models for document ranking with a relevance feedback algorithm for query term reweightin
- …