61,279 research outputs found
Extended path-indexing
The performance of a theorem prover crucially depends on the speed of the basic retrieval operations, such as finding terms that are unifiable with (instances of, or more general than) some query term. Among the known indexing methods for term retrieval in deduction systems, Path--Indexing exhibits a good performance in general. However, as Path--Indexing is not a perfect filter, the candidates found by this method have still to be subjected to a unification algorithm in order to detect occur--check failures and indirect clashes. As perfect filters, discrimination trees and abstraction trees thus outperform Path--Indexing in some cases. We present an improved version of Path--Indexing that provides both the query trees and the Path--Index with indirect clash an occur--check information. Thus compared to the standard method we dismiss much more terms as possible candidates
Thesaurus-based index term extraction for agricultural documents
This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction
Storing and Indexing Plan Derivations through Explanation-based Analysis of Retrieval Failures
Case-Based Planning (CBP) provides a way of scaling up domain-independent
planning to solve large problems in complex domains. It replaces the detailed
and lengthy search for a solution with the retrieval and adaptation of previous
planning experiences. In general, CBP has been demonstrated to improve
performance over generative (from-scratch) planning. However, the performance
improvements it provides are dependent on adequate judgements as to problem
similarity. In particular, although CBP may substantially reduce planning
effort overall, it is subject to a mis-retrieval problem. The success of CBP
depends on these retrieval errors being relatively rare. This paper describes
the design and implementation of a replay framework for the case-based planner
DERSNLP+EBL. DERSNLP+EBL extends current CBP methodology by incorporating
explanation-based learning techniques that allow it to explain and learn from
the retrieval failures it encounters. These techniques are used to refine
judgements about case similarity in response to feedback when a wrong decision
has been made. The same failure analysis is used in building the case library,
through the addition of repairing cases. Large problems are split and stored as
single goal subproblems. Multi-goal problems are stored only when these smaller
cases fail to be merged into a full solution. An empirical evaluation of this
approach demonstrates the advantage of learning from experienced retrieval
failure.Comment: See http://www.jair.org/ for any accompanying file
Heaviest Induced Ancestors and Longest Common Substrings
Suppose we have two trees on the same set of leaves, in which nodes are
weighted such that children are heavier than their parents. We say a node from
the first tree and a node from the second tree are induced together if they
have a common leaf descendant. In this paper we describe data structures that
efficiently support the following heaviest-induced-ancestor query: given a node
from the first tree and a node from the second tree, find an induced pair of
their ancestors with maximum combined weight. Our solutions are based on a
geometric interpretation that enables us to find heaviest induced ancestors
using range queries. We then show how to use these results to build an
LZ-compressed index with which we can quickly find with high probability a
longest substring common to the indexed string and a given pattern
Word matching using single closed contours for indexing handwritten historical documents
Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature
Models of Type Theory Based on Moore Paths
This paper introduces a new family of models of intensional Martin-L\"of type
theory. We use constructive ordered algebra in toposes. Identity types in the
models are given by a notion of Moore path. By considering a particular gros
topos, we show that there is such a model that is non-truncated, i.e. contains
non-trivial structure at all dimensions. In other words, in this model a type
in a nested sequence of identity types can contain more than one element, no
matter how great the degree of nesting. Although inspired by existing
non-truncated models of type theory based on simplicial and cubical sets, the
notion of model presented here is notable for avoiding any form of Kan filling
condition in the semantics of types.Comment: This is a revised and expanded version of a paper with the same name
that appeared in the proceedings of the 2nd International Conference on
Formal Structures for Computation and Deduction (FSCD 2017
- …