2,500 research outputs found

    Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

    Get PDF
    This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

    Challenges and issues in terminology mapping : a digital library perspective

    Get PDF
    In light of information retrieval problems caused by the use of different subject schemes, this paper provides an overview of the terminology problem within the digital library field. Various proposed solutions are outlined and issues within one approach - terminology mapping are highlighted.Desk-based review of existing research. Findings - Discusses benefits of the mapping approach, which include improved retrieval effectiveness for users and an opportunity to overcome problems associated with the use of multilingual schemes. Also describes various drawbacks such as the labour intensive nature and expense of such an approach, the different levels of granularity in existing schemes, and the high maintenance requirements due to scheme updates, and not least the nature of user terminology. General review of mapping techniques as a potential solution to the terminology problem

    Multiple terminologies : an obstacle to information retrieval

    Get PDF
    An issue currently at the forefront of digital library research is the prevalence of disparate terminologies and the associated limitations imposed on user searching. It is thought that semantic interoperability is achievable by improving the compatibility between terminologies and classification schemes, enabling users to search multiple resources simultaneously and improve retrieval effectiveness through the use of associated terms drawn from several schemes. This article considers the terminology issue before outlining various proposed methods of tackling it, with a particular focus on terminology mapping

    Cross-concordances: terminology mapping and its effectiveness for information retrieval

    Get PDF
    The German Federal Ministry for Education and Research funded a major terminology mapping initiative, which found its conclusion in 2007. The task of this terminology mapping initiative was to organize, create and manage 'cross-concordances' between controlled vocabularies (thesauri, classification systems, subject heading lists) centred around the social sciences but quickly extending to other subject areas. 64 crosswalks with more than 500,000 relations were established. In the final phase of the project, a major evaluation effort to test and measure the effectiveness of the vocabulary mappings in an information system environment was conducted. The paper reports on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200

    Automatic Alignment of Multilingual Resources in the Linguistic Linked Open Data Cloud

    Get PDF
    The creation of Europe’s Digital Single Market requires interoperable multilingual resources in the Linguistic Linked Open Data (LLOD) cloud. The PMKI project aims to create a public multilingual knowledge management infrastructure, able to establish and manage interoperability between multilingual classification systems (like thesauri) and other language resources. In this paper the standards used by PMKI and a methodology for automatic mapping between multilingual resources, based on an information retrieval framework, is presented

    HILT IV : subject interoperability through building and embedding pilot terminology web services

    Get PDF
    A report of work carried out within the JISC-funded HILT Phase IV project, the paper looks at the project's context against the background of other recent and ongoing terminologies work, describes its outcome and conclusions, including technical outcomes and terminological characteristics, and considers possible future research and development directions. The Phase IV project has taken HILT to the point where the launch of an operational support service in the area of subject interoperability is a feasible option and where both investigation of specific needs in this area and practical collaborative work are sensible and feasible next steps. Moving forward requires detailed work, not only on terminology interoperability and associated service delivery issues, but also on service and end user needs and engagement, service sustainability issues, and the practicalities of interworking with other terminology services and projects in UK, Europe, and global contexts

    Predicate Matrix: an interoperable lexical knowledge base for predicates

    Get PDF
    183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

    Building a terminology network for search: the KoMoHe project

    Get PDF
    The paper reports about results on the GESIS-IZ project "Competence Center Modeling and Treatment of Semantic Heterogeneity" (KoMoHe). KoMoHe supervised a terminology mapping effort, in which 'cross-concordances' between major controlled vocabularies were organized, created and managed. In this paper we describe the establishment and implementation of cross-concordances for search in a digital library (DL).Comment: 5 pages, 2 figure, Dublin Core Conference 200

    The Lexical Grid: Lexical Resources in Language Infrastructures

    Get PDF
    Language Resources are recognized as a central and strategic for the development of any Human Language Technology system and application product. they play a critical role as horizontal technology and have been recognized in many occasions as a priority also by national and spra-national funding a number of initiatives (such as EAGLES, ISLE, ELRA) to establish some sort of coordination of LR activities, and a number of large LR creation projects, both in the written and in the speech areas
    corecore