8 research outputs found

    Ontologies for a Global Language Infrastructure

    Get PDF
    Given a situation where human language technologies have been maturing considerably and a rapidly growing range of language data resources being now available, together with natural language processing (NLP) tools/systems, a strong need for a global language infrastructure (GLI) is becoming more and more evident, if one wants to ensure re-usability of the resources. A GLI is essentially an open and web-based software platform on which tailored language services can be efficiently composed, disseminated and consumed. An infrastructure of this sort is also expected to facilitate further development of language data resources and NLP functionalities. The aims of this paper are twofold: (1) to discuss necessity of ontologies for a GLI, and (2) to draw a high-level configuration of the ontologies, which are integrated into a comprehensive language service ontology. To these ends, this paper first explores dimensions of GLI, and then draws a triangular view of a language service, from which necessary ontologies are derived. This paper also examines relevant ongoing international standardization efforts such as LAF, MAF, SynAF, DCR and LMF, and discusses how these frameworks are incorporated into our comprehensive language service ontology. The paper concludes in stressing the need for an international collaboration on the development of a standardized language service ontology

    Ontologizing Lexicon Access Functions based on an LMF-based Lexicon Taxonomy

    Get PDF
    This paper discusses ontologization of lexicon access functions in the context of a service-oriented language infrastructure, such as the Language Grid. In such a language infrastructure, an access function to a lexical resource, embodied as an atomic Web service, plays a crucially important role in composing a composite Web service tailored to a user?s specific requirement. To facilitate the composition process involving service discovery, planning and invocation, the language infrastructure should be ontology-based; hence the ontologization of a range of lexicon functions is highly required. In a service-oriented environment, lexical resources however can be classified from a service-oriented perspective rather than from a lexicographically motivated standard. Hence to address the issue of interoperability, the taxonomy for lexical resources should be ground to principled and shared lexicon ontology. To do this, we have ontologized the standardized lexicon modeling framework LMF, and utilized it as a foundation to stipulate the service-oriented lexicon taxonomy and the corresponding ontology for lexicon access functions. This paper also examines a possible solution to fill the gap between the ontological descriptions and the actual Web service API by adopting a W3C recommendation SAWSDL, with which Web service descriptions can be linked with the domain ontology

    The Lexical Grid: Lexical Resources in Language Infrastructures

    Get PDF
    Language Resources are recognized as a central and strategic for the development of any Human Language Technology system and application product. they play a critical role as horizontal technology and have been recognized in many occasions as a priority also by national and spra-national funding a number of initiatives (such as EAGLES, ISLE, ELRA) to establish some sort of coordination of LR activities, and a number of large LR creation projects, both in the written and in the speech areas

    Approach to the Creation of a Multilingual, Medical Interface Terminology

    Get PDF
    International audienceHealth care professionals experience difficulties in the correct medical registration of clinical work and in the efficient searching for answers to clinical questions. These difficulties arise often from a deficient interface between human and machine language. Terminological solutions are often naive attempts to standardize language and terms, with conceptual systems, which may overwhelm the users by their complexity, or be too restrictive to represent crucial details. Moreover, local, professional and cultural differences in vernacular expression are often not represented. We must take into account vocabulary differences between Specialists and General Practitioners talk-ing about the same medical fact. There are even more differences between the languages of patients and physicians. Also, the vocabulary being used evolves over time and space and many local expres-sions exist to designate the same diseases or body parts

    In contrast to the Relevance Theory of Communication

    Get PDF
    As the role of ontology in a multilingual setting becomes important to Semantic Web development, it becomes necessary to understand and model how an original conceptual meaning of a Source Language word is conveyed into a Target Language translation. Terminological ontology [1] is a tool used for knowledge sharing and domain-specific translation, and could potentially be suitable for simulating the cognitive models explaining real-world inter-cultural communication scenarios. In this paper, a framework referred to as the Relevance Theory of Communication [2] is contrasted to an empirical study applying Tversky´s contrast model [3] to datasets obtained from the terminological ontology. The results indicate that the alignment of two language-dependent terminological ontologies is a potential method for optimizing the relevance required in inter-cultural communication, in other words, for identifying corresponding concepts existing in two remote cultures

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Learning for text mining : tackling the cost of feature and knowledge engineering.

    Get PDF
    Over the last decade, the state-of-the-art in text mining has moved towards the adoption of machine learning as the main paradigm at the heart of approaches. Despite significant advances, machine learning based text mining solutions remain costly to design, develop and maintain for real world problems. An important component of such cost (feature engineering) concerns the effort required to understand which features or characteristics of the data can be successfully exploited in inducing a predictive model of the data. Another important component of the cost (knowledge engineering) has to do with the effort in creating labelled data, and in eliciting knowledge about the mining systems and the data itself. I present a series of approaches, methods and findings aimed at reducing the cost of creating and maintaining document classification and information extraction systems. They address the following questions: Which classes of features lead to an improved classification accuracy in the document classification and entity extraction tasks? How to reduce the amount of labelled examples needed to train machine learning based document classification and information extraction systems, so as to relieve domain experts from this costly task? How to effectively represent knowledge about these systems and the data that they manipulate, in order to make systems interoperable and results replicable? I provide the reader with the background information necessary to understand the above questions and the contributions to the state-of the- art contained herein. The contributions include: the identification of novel classes of features for the document classification task which exploit the multimedia nature of documents and lead to improved classification accuracy; a novel approach to domain adaptation for text categorization which outperforms standard supervised and semi-supervised methods while requiring considerably less supervision; and a well-founded formalism for declaratively specifying text and multimedia mining systems
    corecore