61,279 research outputs found

    Extended path-indexing

    No full text
    The performance of a theorem prover crucially depends on the speed of the basic retrieval operations, such as finding terms that are unifiable with (instances of, or more general than) some query term. Among the known indexing methods for term retrieval in deduction systems, Path--Indexing exhibits a good performance in general. However, as Path--Indexing is not a perfect filter, the candidates found by this method have still to be subjected to a unification algorithm in order to detect occur--check failures and indirect clashes. As perfect filters, discrimination trees and abstraction trees thus outperform Path--Indexing in some cases. We present an improved version of Path--Indexing that provides both the query trees and the Path--Index with indirect clash an occur--check information. Thus compared to the standard method we dismiss much more terms as possible candidates

    Thesaurus-based index term extraction for agricultural documents

    Get PDF
    This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction

    Storing and Indexing Plan Derivations through Explanation-based Analysis of Retrieval Failures

    Full text link
    Case-Based Planning (CBP) provides a way of scaling up domain-independent planning to solve large problems in complex domains. It replaces the detailed and lengthy search for a solution with the retrieval and adaptation of previous planning experiences. In general, CBP has been demonstrated to improve performance over generative (from-scratch) planning. However, the performance improvements it provides are dependent on adequate judgements as to problem similarity. In particular, although CBP may substantially reduce planning effort overall, it is subject to a mis-retrieval problem. The success of CBP depends on these retrieval errors being relatively rare. This paper describes the design and implementation of a replay framework for the case-based planner DERSNLP+EBL. DERSNLP+EBL extends current CBP methodology by incorporating explanation-based learning techniques that allow it to explain and learn from the retrieval failures it encounters. These techniques are used to refine judgements about case similarity in response to feedback when a wrong decision has been made. The same failure analysis is used in building the case library, through the addition of repairing cases. Large problems are split and stored as single goal subproblems. Multi-goal problems are stored only when these smaller cases fail to be merged into a full solution. An empirical evaluation of this approach demonstrates the advantage of learning from experienced retrieval failure.Comment: See http://www.jair.org/ for any accompanying file

    Heaviest Induced Ancestors and Longest Common Substrings

    Full text link
    Suppose we have two trees on the same set of leaves, in which nodes are weighted such that children are heavier than their parents. We say a node from the first tree and a node from the second tree are induced together if they have a common leaf descendant. In this paper we describe data structures that efficiently support the following heaviest-induced-ancestor query: given a node from the first tree and a node from the second tree, find an induced pair of their ancestors with maximum combined weight. Our solutions are based on a geometric interpretation that enables us to find heaviest induced ancestors using range queries. We then show how to use these results to build an LZ-compressed index with which we can quickly find with high probability a longest substring common to the indexed string and a given pattern

    Word matching using single closed contours for indexing handwritten historical documents

    Get PDF
    Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature

    Models of Type Theory Based on Moore Paths

    Full text link
    This paper introduces a new family of models of intensional Martin-L\"of type theory. We use constructive ordered algebra in toposes. Identity types in the models are given by a notion of Moore path. By considering a particular gros topos, we show that there is such a model that is non-truncated, i.e. contains non-trivial structure at all dimensions. In other words, in this model a type in a nested sequence of identity types can contain more than one element, no matter how great the degree of nesting. Although inspired by existing non-truncated models of type theory based on simplicial and cubical sets, the notion of model presented here is notable for avoiding any form of Kan filling condition in the semantics of types.Comment: This is a revised and expanded version of a paper with the same name that appeared in the proceedings of the 2nd International Conference on Formal Structures for Computation and Deduction (FSCD 2017
    corecore