1,229 research outputs found

    On Type-Aware Entity Retrieval

    Full text link
    Today, the practice of returning entities from a knowledge base in response to search queries has become widespread. One of the distinctive characteristics of entities is that they are typed, i.e., assigned to some hierarchically organized type system (type taxonomy). The primary objective of this paper is to gain a better understanding of how entity type information can be utilized in entity retrieval. We perform this investigation in an idealized "oracle" setting, assuming that we know the distribution of target types of the relevant entities for a given query. We perform a thorough analysis of three main aspects: (i) the choice of type taxonomy, (ii) the representation of hierarchical type information, and (iii) the combination of type-based and term-based similarity in the retrieval model. Using a standard entity search test collection based on DBpedia, we find that type information proves most useful when using large type taxonomies that provide very specific types. We provide further insights on the extensional coverage of entities and on the utility of target types.Comment: Proceedings of the 3rd ACM International Conference on the Theory of Information Retrieval (ICTIR '17), 201

    Target Type Identification for Entity-Bearing Queries

    Full text link
    Identifying the target types of entity-bearing queries can help improve retrieval performance as well as the overall search experience. In this work, we address the problem of automatically detecting the target types of a query with respect to a type taxonomy. We propose a supervised learning approach with a rich variety of features. Using a purpose-built test collection, we show that our approach outperforms existing methods by a remarkable margin. This is an extended version of the article published with the same title in the Proceedings of SIGIR'17.Comment: Extended version of SIGIR'17 short paper, 5 page

    POLIS: a probabilistic summarisation logic for structured documents

    Get PDF
    PhDAs the availability of structured documents, formatted in markup languages such as SGML, RDF, or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements, rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval logics have allowed developers to include search facilities into numerous applications, without the need of having detailed knowledge of retrieval models. Although automatic document summarisation has been recognised as a useful tool for reducing the workload of information system users, very few such abstraction layers have been developed for the task of automatic document summarisation. This thesis describes the development of an abstraction logic for summarisation, called POLIS, which provides users (such as developers or knowledge engineers) with a high-level access to summarisation facilities. Furthermore, POLIS allows users to exploit the hierarchical information provided by structured documents. The development of POLIS is carried out in a step-by-step way. We start by defining a series of probabilistic summarisation models, which provide weights to document-elements at a user selected level. These summarisation models are those accessible through POLIS. The formal definition of POLIS is performed in three steps. We start by providing a syntax for POLIS, through which users/knowledge engineers interact with the logic. This is followed by a definition of the logics semantics. Finally, we provide details of an implementation of POLIS. The final chapters of this dissertation are concerned with the evaluation of POLIS, which is conducted in two stages. Firstly, we evaluate the performance of the summarisation models by applying POLIS to two test collections, the DUC AQUAINT corpus, and the INEX IEEE corpus. This is followed by application scenarios for POLIS, in which we discuss how POLIS can be used in specific IR tasks

    What's Decidable About Sequences?

    Full text link
    We present a first-order theory of sequences with integer elements, Presburger arithmetic, and regular constraints, which can model significant properties of data structures such as arrays and lists. We give a decision procedure for the quantifier-free fragment, based on an encoding into the first-order theory of concatenation; the procedure has PSPACE complexity. The quantifier-free fragment of the theory of sequences can express properties such as sortedness and injectivity, as well as Boolean combinations of periodic and arithmetic facts relating the elements of the sequence and their positions (e.g., "for all even i's, the element at position i has value i+3 or 2i"). The resulting expressive power is orthogonal to that of the most expressive decidable logics for arrays. Some examples demonstrate that the fragment is also suitable to reason about sequence-manipulating programs within the standard framework of axiomatic semantics.Comment: Fixed a few lapses in the Mergesort exampl

    From Logic Programming to Human Reasoning:: How to be Artificially Human

    Get PDF
    Results of psychological experiments have shown that humans make assumptions, which are not necessarily valid, that they are influenced by their background knowledge and that they reason non-monotonically. These observations show that classical logic does not seem to be adequate for modeling human reasoning. Instead of assuming that humans do not reason logically at all, we take the view that humans do not reason classical logically. Our goal is to model episodes of human reasoning and for this purpose we investigate the so-called Weak Completion Semantics. The Weak Completion Semantics is a Logic Programming approach and considers the least model of the weak completion of logic programs under the three-valued Łukasiewicz logic. As the Weak Completion Semantics is relatively new and has not yet been extensively investigated, we first motivate why this approach is interesting for modeling human reasoning. After that, we show the formal correspondence to the already established Stable Model Semantics and Well-founded Semantics. Next, we present an extension with an additional context operator, that allows us to express negation as failure. Finally, we propose a contextual abductive reasoning approach, in which the context of observations is relevant. Some properties do not hold anymore under this extension. Besides discussing the well-known psychological experiments Byrne’s suppression task and Wason’s selection task, we investigate an experiment in spatial reasoning, an experiment in syllogistic reasoning and an experiment that examines the belief-bias effect. We show that the results of these experiments can be adequately modeled under the Weak Completion Semantics. A result which stands out here, is the outcome of modeling the syllogistic reasoning experiment, as we have a higher prediction match with the participants’ answers than any of twelve current cognitive theories. We present an abstract evaluation system for conditionals and discuss well-known examples from the literature. We show that in this system, conditionals can be evaluated in various ways and we put up the hypothesis that humans use a particular evaluation strategy, namely that they prefer abduction to revision. We also discuss how relevance plays a role in the evaluation process of conditionals. For this purpose we propose a semantic definition of relevance and justify why this is preferable to a exclusively syntactic definition. Finally, we show that our system is more general than another system, which has recently been presented in the literature. Altogether, this thesis shows one possible path on bridging the gap between Cognitive Science and Computational Logic. We investigated findings from psychological experiments and modeled their results within one formal approach, the Weak Completion Semantics. Furthermore, we proposed a general evaluation system for conditionals, for which we suggest a specific evaluation strategy. Yet, the outcome cannot be seen as the ultimate solution but delivers a starting point for new open questions in both areas

    Visual exploration and retrieval of XML document collections with the generic system X2

    Get PDF
    This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed

    Knowledge-rich Image Gist Understanding Beyond Literal Meaning

    Full text link
    We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process
    corecore