1,160 research outputs found

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

    Get PDF
    This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diļ¬€erent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aļ¬€ects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diļ¬€erent steps of the procedure (mapping, disambiguation, extraction, NE identiļ¬cation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the systemā€™s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

    An operational system for subject switching between controlled vocabularies: A computational linguistics approach

    Get PDF
    The NASA Lexical Dictionary (NLD), a system that automatically translates input subject terms to those of NASA, was developed in four phases. Phase One provided Phrase Matching, a context sensitive word-matching process that matches input phrase words with any NASA Thesaurus posting (i.e., index) term or Use reference. Other Use references have been added to enable the matching of synonyms, variant spellings, and some words with the same root. Phase Two provided the capability of translating any individual DTIC term to one or more NASA terms having the same meaning. Phase Three provided NASA terms having equivalent concepts for two or more DTIC terms, i.e., coordinations of DTIC terms. Phase Four was concerned with indexer feedback and maintenance. Although the original NLD construction involved much manual data entry, ways were found to automate nearly all but the intellectual decision-making processes. In addition to finding improved ways to construct a lexical dictionary, applications for the NLD have been found and are being developed

    Design of a Controlled Language for Critical Infrastructures Protection

    Get PDF
    We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
    • ā€¦
    corecore