32 research outputs found

    Toward the automation of business process ontology generation

    Get PDF
    Semantic Business Process Management (SBPM) utilises semantic technologies (e.g., ontology) to model and query process representations. There are times in which such models must be reconstructed from existing textual documentation. In this scenario the automated generation of ontological models would be preferable, however current methods and technology are still not capable of automatically generating accurate semantic process models from textual descriptions. This research attempts to automate the process as much as possible by proposing a method that drives the transformation through the joint use of a foundational ontology and lexico-semantic analysis. The method is presented, demonstrated and evaluated. The original dataset represents 150 business activities related to the procurement processes of a case study company. As the evaluation shows, the proposed method can accurately map the linguistic patterns of the process descriptions to semantic patterns of the foundational ontology to a high level of accuracy, however further research is required in order to reduce the level of human intervention, expand the method so as to recognise further patterns of the foundational ontology and develop a tool to assist the business process modeller in the semi-automated generation of process models

    Unsupervised Sense-Aware Hypernymy Extraction

    Full text link
    In this paper, we show how unsupervised sense representations can be used to improve hypernymy extraction. We present a method for extracting disambiguated hypernymy relationships that propagates hypernyms to sets of synonyms (synsets), constructs embeddings for these sets, and establishes sense-aware relationships between matching synsets. Evaluation on two gold standard datasets for English and Russian shows that the method successfully recognizes hypernymy relationships that cannot be found with standard Hearst patterns and Wiktionary datasets for the respective languages.Comment: In Proceedings of the 14th Conference on Natural Language Processing (KONVENS 2018). Vienna, Austri

    Linking and Validating Nordic and Baltic Wordnets - A Multilingual Action in META-NORD

    Get PDF
    This project report describes a multilingual wordnet initiative embarked in the META-NORD project and concerned with the validation and pilot linking between Nordic and Baltic wordnets. The builders of these wordnets have applied very different compilation strategies: The Danish, Icelandic and Swedish wordnets are being developed via monolingual dictionaries and corpora and subsequently linked to Princeton WordNet. In contrast, the Finnish and Norwegian wordnets are applying the expand method by translating from Princeton WordNet and the Danish wordnet, DanNet, respectively. The Estonian wordnet was built as part of the EuroWordNet project and by translating the base concepts from English as a first basis for monolingual extension. The aim of the multilingual action is to test the perspective of a multilingual linking of the Nordic and Baltic wordnets and via this (pilot) linking to perform a tentative comparison and validation of the wordnets along the measures of taxonomical structure, coverage, granularity and completeness.Peer reviewe

    Variation and Semantic Relation Interpretation: Linguistic and Processing Issues

    Get PDF
    International audienceStudies in linguistics define lexico-syntactic patterns to characterize the linguistic utterances that can be interpreted with semantic relations. Because patterns are assumed to reflect linguistic regularities that have a stable interpretation, several software implement such patterns to extract semantic relations from text. Nevertheless, a thorough analysis of pattern occurrences in various corpora proved that variation may affect their interpretation. In this paper, we report the linguistic variations that impact relation interpretation in language, and may lead to errors in relation extraction systems. We analyze several features of state-of-the-art pattern-based relation extraction tools, mostly how patterns are represented and matched with text, and discuss their role in the tool ability to manage variation

    Extraction de relations : Exploiter des techniques complémentaires pour mieux s'adapter au type de texte

    Get PDF
    Extraire des relations d'hyperonymie à partir des textes est une des étapes clés de la construction automatique d'ontologies et du peuplement de bases de connaissances. Plusieurs types de méthodes (linguistiques, statistiques, combinées) ont été exploités par une variété de propositions dans la littérature. Les apports respectifs et la complémentarité de ces méthodes sont cependant encore mal identifiés pour optimiser leur combinaison. Dans cet article, nous nous intéressons à la complémentarité de deux méthodes de nature différente, l'une basée sur les patrons linguistiques, l'autre sur l'apprentissage supervisé, pour identifier la relation d'hyperonymie à travers différents modes d'expression. Nous avons appliqué ces méthodes à un sous-corpus de Wikipedia en français, composé des pages de désambiguïsation. Ce corpus se prête bien à la mise en oeuvre des deux approches retenues car ces textes sont particulièrement riches en relations d'hyperonymie, et contiennent à la fois des formulations rédigées et d'autres syntaxiquement pauvres. Nous avons comparé les résultats des deux méthodes prises indépendamment afin d'établir leurs performances respectives, et de les comparer avec le résultat des deux méthodes appliquées ensemble. Les meilleurs résultats obtenus correspondent à ce dernier cas de figure avec une F-mesure de 0.68. De plus, l'extracteur Wikipedia issu de ce travail permet d'enrichir la ressource sémantique DBPedia en français : 55% des relations identifiées par notre extracteur ne sont pas déjà présentes dans DBPedia

    Learning of a multilingual bitaxonomy of Wikipedia and its application to semantic predicates

    Get PDF
    The ability to extract hypernymy information on a large scale is becoming increasingly important in natural language processing, an area of the artificial intelligence which deals with the processing and understanding of natural language. While initial studies extracted this type of information from textual corpora by means of lexico-syntactic patterns, over time researchers moved to alternative, more structured sources of knowledge, such as Wikipedia. After the first attempts to extract is-a information fromWikipedia categories, a full line of research gave birth to numerous knowledge bases containing information which, however, is either incomplete or irremediably bound to English. To this end we put forward MultiWiBi, the first approach to the construction of a multilingual bitaxonomy which exploits the inner connection between Wikipedia pages and Wikipedia categories to induce a wide-coverage and fine-grained integrated taxonomy. A series of experiments show state-of-the-art results against all the available taxonomic resources available in the literature, also with respect to two novel measures of comparison. Another dimension where existing resources usually fall short is their degree of multilingualism. While knowledge is typically language agnostic, currently resources are able to extract relevant information only in languages providing highquality tools. In contrast, MultiWiBi does not leave any language behind: we show how to taxonomize Wikipedia in an arbitrary language and in a way that is fully independent of additional resources. At the core of our approach lies, in fact, the idea that the English version of Wikipedia can be linguistically exploited as a pivot to project the taxonomic information extracted from English to any other Wikipedia language in order to have a bitaxonomy in a second, arbitrary language; as a result, not only concepts which have an English equivalent are covered, but also those concepts which are not lexicalized in the source language. We also present the impact of having the taxonomized encyclopedic knowledge offered by MultiWiBi embedded into a semantic model of predicates (SPred) which crucially leverages Wikipedia to generalize collections of related noun phrases to infer a probability distribution over expected semantic classes. We applied SPred to a word sense disambiguation task and show that, when MultiWiBi is plugged in to replace an internal component, SPred’s generalization power increases as well as its precision and recall. Finally, we also published MultiWiBi as linked data, a paradigm which fosters interoperability and interconnection among resources and tools through the publication of data on the Web, and developed a public interface which lets the users navigate through MultiWiBi’s taxonomic structure in a graphical, captivating manner

    Data linking for the Semantic Web

    Get PDF
    By specifying that published datasets must link to other existing datasets, the 4th linked data principle ensures a Web of data and not just a set of unconnected data islands. The authors propose in this paper the term data linking to name the problem of finding equivalent resources on the Web of linked data. In order to perform data linking, many techniques were developed, finding their roots in statistics, database, natural language processing and graph theory. The authors begin this paper by providing background information and terminological clarifications related to data linking. Then a comprehensive survey over the various techniques available for data linking is provided. These techniques are classified along the three criteria of granularity, type of evidence, and source of the evidence. Finally, the authors survey eleven recent tools performing data linking and we classify them according to the surveyed techniques

    The Lexical Grid: Lexical Resources in Language Infrastructures

    Get PDF
    Language Resources are recognized as a central and strategic for the development of any Human Language Technology system and application product. they play a critical role as horizontal technology and have been recognized in many occasions as a priority also by national and spra-national funding a number of initiatives (such as EAGLES, ISLE, ELRA) to establish some sort of coordination of LR activities, and a number of large LR creation projects, both in the written and in the speech areas

    Variation syntaxique et contextuelle dans la mise au point de patrons de relations sémantiques

    Get PDF
    Depuis une quinzaine d'années, des marqueurs linguistiques sont employés comme un moyen potentiel de repérer des relations conceptuelles en corpus. Il s'agit d'éléments formels (typographiques, lexicaux, syntaxiques) dont on fait l'hypothèse qu'ils peuvent être utilisés de manière plus ou moins systématique pour accéder, dans des textes, à une relation lexicale ou mieux conceptuelle, déterminée a priori (le plus souvent, hyperonymie, méronymie, cause). Ce processus c'est ni immédiat ni complètement automatique. Il porte d'abord sur la sélection d'un certain nombre de phrases dont une partie est conforme au schéma défini par le patron. Il requiert ensuite une interprétation, éventuellement guidée par des suggestions faites sur la base des hypothèses de la signification a priori de ces patrons, pour identifier des termes en relation et la sémantique de cette relation. Enfin, la dernière étape consiste à s'éloigner un peu plus du texte pour décider d'une représentation conceptuelle de la relation. Ce chapitre se propose d'interroger la variation du fonctionnement des marqueurs en croisant les regards du TAL, de l'IC et de la linguistique de corpus
    corecore