1,229 research outputs found
On Type-Aware Entity Retrieval
Today, the practice of returning entities from a knowledge base in response
to search queries has become widespread. One of the distinctive characteristics
of entities is that they are typed, i.e., assigned to some hierarchically
organized type system (type taxonomy). The primary objective of this paper is
to gain a better understanding of how entity type information can be utilized
in entity retrieval. We perform this investigation in an idealized "oracle"
setting, assuming that we know the distribution of target types of the relevant
entities for a given query. We perform a thorough analysis of three main
aspects: (i) the choice of type taxonomy, (ii) the representation of
hierarchical type information, and (iii) the combination of type-based and
term-based similarity in the retrieval model. Using a standard entity search
test collection based on DBpedia, we find that type information proves most
useful when using large type taxonomies that provide very specific types. We
provide further insights on the extensional coverage of entities and on the
utility of target types.Comment: Proceedings of the 3rd ACM International Conference on the Theory of
Information Retrieval (ICTIR '17), 201
Target Type Identification for Entity-Bearing Queries
Identifying the target types of entity-bearing queries can help improve
retrieval performance as well as the overall search experience. In this work,
we address the problem of automatically detecting the target types of a query
with respect to a type taxonomy. We propose a supervised learning approach with
a rich variety of features. Using a purpose-built test collection, we show that
our approach outperforms existing methods by a remarkable margin. This is an
extended version of the article published with the same title in the
Proceedings of SIGIR'17.Comment: Extended version of SIGIR'17 short paper, 5 page
POLIS: a probabilistic summarisation logic for structured documents
PhDAs the availability of structured documents, formatted in markup languages such as SGML, RDF,
or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements,
rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval
logics have allowed developers to include search facilities into numerous applications, without
the need of having detailed knowledge of retrieval models.
Although automatic document summarisation has been recognised as a useful tool for reducing
the workload of information system users, very few such abstraction layers have been developed
for the task of automatic document summarisation. This thesis describes the development
of an abstraction logic for summarisation, called POLIS, which provides users (such as developers
or knowledge engineers) with a high-level access to summarisation facilities. Furthermore,
POLIS allows users to exploit the hierarchical information provided by structured documents.
The development of POLIS is carried out in a step-by-step way. We start by defining a series
of probabilistic summarisation models, which provide weights to document-elements at a user
selected level. These summarisation models are those accessible through POLIS. The formal
definition of POLIS is performed in three steps. We start by providing a syntax for POLIS,
through which users/knowledge engineers interact with the logic. This is followed by a definition
of the logics semantics. Finally, we provide details of an implementation of POLIS.
The final chapters of this dissertation are concerned with the evaluation of POLIS, which is
conducted in two stages. Firstly, we evaluate the performance of the summarisation models by
applying POLIS to two test collections, the DUC AQUAINT corpus, and the INEX IEEE corpus.
This is followed by application scenarios for POLIS, in which we discuss how POLIS can be used in specific IR tasks
What's Decidable About Sequences?
We present a first-order theory of sequences with integer elements,
Presburger arithmetic, and regular constraints, which can model significant
properties of data structures such as arrays and lists. We give a decision
procedure for the quantifier-free fragment, based on an encoding into the
first-order theory of concatenation; the procedure has PSPACE complexity. The
quantifier-free fragment of the theory of sequences can express properties such
as sortedness and injectivity, as well as Boolean combinations of periodic and
arithmetic facts relating the elements of the sequence and their positions
(e.g., "for all even i's, the element at position i has value i+3 or 2i"). The
resulting expressive power is orthogonal to that of the most expressive
decidable logics for arrays. Some examples demonstrate that the fragment is
also suitable to reason about sequence-manipulating programs within the
standard framework of axiomatic semantics.Comment: Fixed a few lapses in the Mergesort exampl
From Logic Programming to Human Reasoning:: How to be Artificially Human
Results of psychological experiments have shown that humans make assumptions, which are not necessarily valid, that they are influenced by their background knowledge and that they reason non-monotonically. These observations show that classical logic does not seem to be adequate for modeling human reasoning. Instead of assuming that humans do not reason logically at all, we take the view that humans do not reason classical logically. Our goal is to model episodes of human reasoning and for this purpose we investigate the so-called Weak Completion Semantics. The Weak Completion Semantics is a Logic Programming approach and considers the least model of the weak completion of logic programs under the three-valued Łukasiewicz logic.
As the Weak Completion Semantics is relatively new and has not yet been extensively investigated, we first motivate why this approach is interesting for modeling human reasoning. After that, we show the formal correspondence to the already established Stable Model Semantics and Well-founded Semantics. Next, we present an extension with an additional context operator, that allows us to express negation as failure. Finally, we propose a contextual abductive reasoning approach, in which the context of observations is relevant. Some properties do not hold anymore under this extension. Besides discussing the well-known psychological experiments Byrne’s suppression task and Wason’s selection task, we investigate an experiment in spatial reasoning, an experiment in syllogistic reasoning and an experiment that examines the belief-bias effect. We show that the results of these experiments can be adequately modeled under the Weak Completion Semantics. A result which stands out here, is the outcome of modeling the syllogistic reasoning experiment, as we have a higher prediction match with the participants’ answers than any of twelve current cognitive theories.
We present an abstract evaluation system for conditionals and discuss well-known examples from the literature. We show that in this system, conditionals can be evaluated in various ways and we put up the hypothesis that humans use a particular evaluation strategy, namely that they prefer abduction to revision. We also discuss how relevance plays a role in the evaluation process of conditionals. For this purpose we propose a semantic definition of relevance and justify why this is preferable to a exclusively syntactic definition. Finally, we show that our system is more general than another system, which has recently been presented in the literature.
Altogether, this thesis shows one possible path on bridging the gap between Cognitive Science and Computational Logic. We investigated findings from psychological experiments and modeled their results within one formal approach, the Weak Completion Semantics. Furthermore, we proposed a general evaluation system for conditionals, for which we suggest a specific evaluation strategy. Yet, the outcome cannot be seen as the ultimate solution but delivers a starting point for new open questions in both areas
Visual exploration and retrieval of XML document collections with the generic system X2
This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user
first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically.
After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
- …