314 research outputs found

    Thesaurus-based index term extraction for agricultural documents

    Get PDF
    This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction

    Mining Domain-Specific Thesauri from Wikipedia: A case study

    Get PDF
    Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts

    An Architecture for Data and Knowledge Acquisition for the Semantic Web: the AGROVOC Use Case

    Get PDF
    We are surrounded by ever growing volumes of unstructured and weakly-structured information, and for a human being, domain expert or not, it is nearly impossible to read, understand and categorize such information in a fair amount of time. Moreover, different user categories have different expectations: final users need easy-to-use tools and services for specific tasks, knowledge engineers require robust tools for knowledge acquisition, knowledge categorization and semantic resources development, while semantic applications developers demand for flexible frameworks for fast and easy, standardized development of complex applications. This work represents an experience report on the use of the CODA framework for rapid prototyping and deployment of knowledge acquisition systems for RDF. The system integrates independent NLP tools and custom libraries complying with UIMA standards. For our experiment a document set has been processed to populate the AGROVOC thesaurus with two new relationships

    SKOS Sources Transformations for Ontology Engineering: Agronomical Taxonomy Use Case

    Get PDF
    Sources like thesauri or taxonomies are already used as input in ontology development process. Some of them are also published on the LOD using the SKOS format. Reusing this type of sources to build an ontology is not an easy task. The ontology developer has to face different syntax and different modelling goals. We propose in this paper a new methodology to transform several non-ontological sources into a single ontology. We take into account: the redundancy of the knowledge extracted from sources in order to discover the consensual knowledge and Ontology Design Patterns (ODPs) to guide the transformation process. We have evaluated our methodology by creating an ontology on wheat taxonomy from three sources: Agrovoc thesaurus, TaxRef taxonomy, NCBI taxonomy

    Qualitative terminology extraction: Identifying relational adjectives

    Get PDF
    International audienceThis paper presents the identification in corpora of French relational adjectives, phenomena considered by linguists as highly informative. The approach uses a termer which is applied on a tagged and lemmatized corpus. Relational adjectives and nominal compounds which include a relational adjective are then quantified and their informative status is evaluated thanks to a thesaurus of the domain. We conclude with a discussion of the interesting status of such adjectives and nominal compounds for terminology extraction and other automatic terminology tasks

    Cross-concordances: terminology mapping and its effectiveness for information retrieval

    Get PDF
    The German Federal Ministry for Education and Research funded a major terminology mapping initiative, which found its conclusion in 2007. The task of this terminology mapping initiative was to organize, create and manage 'cross-concordances' between controlled vocabularies (thesauri, classification systems, subject heading lists) centred around the social sciences but quickly extending to other subject areas. 64 crosswalks with more than 500,000 relations were established. In the final phase of the project, a major evaluation effort to test and measure the effectiveness of the vocabulary mappings in an information system environment was conducted. The paper reports on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200

    Prédiction de la polysémie pour un terme biomédical

    Get PDF
    National audiencePolysemy is the capacity for a term to have multiple meanings. Polysemy prediction is a first step for Word Sense Induction (WSI), which allows to find different meanings for a term, as well as for Information Extraction (IE) systems. In addition, the polysemy detection is important for building and enriching terminologies and ontologies. In this paper, we present a novel approach to detect if a biomedical term is polysemic or not, with the long term goal of enriching biomedical ontologies after disambiguation of candidate terms. This approach is based on meta-learning techniques, more precisely on meta-features. We propose the definition of novel meta-features, extracted directly from the text dataset, as well as from a graph of coc- current terms. Our method obtains very good results, with an Accuracy and F-mesure of 0.978.La polysémie est la caractéristique d'un terme à avoir plusieurs significations. La prédiction de la polysémie est une première étape pour l'Induction de Sens (IS), qui permet de trouver des significations différentes pour un terme, ainsi que pour les systèmes d'extraction d'information. En outre, la détection de la polysémie est importante pour la construction et l'en-richissement de terminologies et d'ontologies. Dans cet article, nous présentons une nouvelle approche pour prédire si un terme biomédical est polysémique ou non, avec l'objectif à long terme d'enrichir les ontologies biomédicales après avoir désambiguïser les termes candidats. Cette approche est basée sur l'utilisation de techniques de méta-apprentissage, plus précisé-ment sur des méta-descripteurs. Dans ce contexte, nous proposons la définition de nouveaux méta-descripteurs, extraits directement du texte, et d'un graphe de co-occurrences des termes. Notre méthode donne des résultats très satisfaisants, avec une exactitude et F-mesure de 0.978

    Etat de l'art : Extraction d'information à partir de thésaurus pour générer une ontologie

    Get PDF
    International audienceAfin de participer au Web de données pour l'agriculture, nous voulons réutiliser AGRO-VOC qui est un thésaurus multilingue maintenu par la FAO comportant plus de 40.000 termes. Nous présentons ici un état de l'art des techniques de transformation de thésaurus pour obtenir une ontologie de domaine. Pour cela, nous avons étudié dix approches suivant trois axes : l'extraction de classes, l'extraction de la hiérarchie et l'extraction de relations. Ainsi, nous avons mis en évidence certaines difficultés liées à la transformation de thésaurus comme la désambiguïsation des relations ou la validation des résultats. Nous constatons que les dernières approches mises en oeuvre sont fondées sur des techniques manuelles pour répondre en partie à ces difficultés
    corecore