1,955 research outputs found

    Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

    Get PDF
    In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)

    Decidability and Complexity of Tree Share Formulas

    Get PDF
    Fractional share models are used to reason about how multiple actors share ownership of resources. We examine the decidability and complexity of reasoning over the "tree share" model of Dockins et al. using first-order logic, or fragments thereof. We pinpoint a connection between the basic operations on trees union, intersection, and complement and countable atomless Boolean algebras, allowing us to obtain decidability with the precise complexity of both first-order and existential theories over the tree share model with the aforementioned operations. We establish a connection between the multiplication operation on trees and the theory of word equations, allowing us to derive the decidability of its existential theory and the undecidability of its full first-order theory. We prove that the full first-order theory over the model with both the Boolean operations and the restricted multiplication operation (with constants on the right hand side) is decidable via an embedding to tree-automatic structures

    Description of the LTG system used for MUC-7

    Get PDF
    The basic building blocks in our muc system are reusable text handling tools which wehave been developing and using for a number of years at the Language Technology Group. They are modular tools with stream input/output; each tooldoesavery speci c job, but can be combined with other tools in a unix pipeline. Di erent combinations of the same tools can thus be used in a pipeline for completing di erent tasks. Our architecture imposes an additional constraint on the input/output streams: they should have a common syntactic format. For this common format we chose eXtensible Markup Language (xml). xml is an o cial, simpli ed version of Standard Generalised Markup Language (sgml), simpli ed to make processing easier [3]. Wewere involved in the developmentofthexml standard, building on our expertise in the design of our own Normalised sgml (nsl) and nsl tool lt nsl [10], and our xml tool lt xml [11]. A detailed comparison of this sgml-oriented architecture with more traditional data-base oriented architectures can be found in [9]. A tool in our architecture is thus a piece of software which uses an api for all its access to xml and sgml data and performs a particular task: exploiting markup which has previously been added by other tools, removing markup, or adding new markup to the stream(s) without destroying the previously adde
    • …
    corecore