8 research outputs found

    Metodología y evaluación de la expansión del WordNet del gallego con WN-Toolkit

    Get PDF
    In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results has been performed both in an automatic and in a manual way, allowing a comparison of the precision values obtained with both evaluation procedures. The manual evaluation provides details about the source of the errors. This information has been very useful for the improvement of the toolkit and for the correction of some errors in the reference WordNet for Galician.En este artículo se presenta la metodología utilizada en la expansión del WordNet del gallego mediante el WN-Toolkit, así como una evaluación detallada de los resultados obtenidos. El conjunto de herramientas incluido en el WN-Toolkit permite la creación o expansión de wordnets siguiendo la estrategia de expansión. En los experimentos presentados en este artículo se han utilizado estrategias basadas en diccionarios y en corpus paralelos. La evaluación de los resultados se ha realizado de manera tanto automática como manual, permitiendo así la comparación de los valores de precisión obtenidos. La evaluación manual también detalla la fuente de los errores, lo que ha sido de utilidad tanto para mejorar el propio WN-Toolkit, como para corregir los errores del WordNet de referencia para el gallego.This research has been carried out thanks to the Project SKATeR (TIN2012-38584-C06-01 and TIN2012-38584-C06-04) supported by the Ministry of Economy and Competitiveness of the Spanish Government

    Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit

    Get PDF
    In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results has been performed both in an automatic and in a manual way, allowing a comparison of the precision values obtained with both evaluation procedures. The manual evaluation provides details about the source of the errors. This information has been very useful for the improvement of the toolkit and for the correction of some errors in the reference WordNet for Galician.En este artículo se presenta la metodología utilizada en la expansión del WordNet del gallego mediante el WN-Toolkit, así como una evaluación detallada de los resultados obtenidos. El conjunto de herramientas incluido en el WN-Toolkit permite la creación o expansión de wordnets siguiendo la estrategia de expansión. En los experimentos presentados en este artículo se han utilizado estrategias basadas en diccionarios y en corpus paralelos. La evaluación de los resultados se ha realizado de manera tanto automática como manual, permitiendo así la comparación de los valores de precisión obtenidos. La evaluación manual también detalla la fuente de los errores, lo que ha sido de utilidad tanto para mejorar el propio WN-Toolkit, como para corregir los errores del WordNet de referencia para el gallego.En aquest article es presenta la metodologia utilitzada en l'expansió del WordNet del gallec mitjançant el WN-Toolkit, així com una avaluació detallada dels resultats obtinguts. El conjunt d'eines inclòs en el WN-Toolkit permet la creació o expansió de wordnets seguint l'estratègia d'expansió. En els experiments presentats en aquest article s'han utilitzat estratègies basades en diccionaris i en corpus paral·lels. L'avaluació dels resultats s'ha realitzat de manera tant automàtica com a manual, permetent així la comparació dels valors de precisió obtinguts. L'avaluació manual també detalla la font dels errors, la qual cosa ha estat d'utilitat tant per millorar el propi WN-Toolkit, com per corregir els errors del WordNet de referència per al gallec

    Acquiring Domain-Specific Knowledge for WordNet from a Terminological Database

    Get PDF
    In this research we explore a terminological database (Termoteca) in order to expand the Portuguese and Galician wordnets (PULO and Galnet) with the addition of new synset variants (word forms for a concept), usage examples for the variants, and synset glosses or definitions. The methodology applied in this experiment is based on the alignment between concepts of WordNet (synsets) and concepts described in Termoteca (terminological records), taking into account the lexical forms in both resources, their morphological category and their knowledge domains, using the information provided by the WordNet Domains Hierarchy and the Termoteca field domains to reduce the incidence of polysemy and homography in the results of the experiment. The results obtained confirm our hypothesis that the combined use of the semantic domain information included in both resources makes it possible to minimise the problem of lexical ambiguity and to obtain a very acceptable index of precision in terminological information extraction tasks, attaining a precision above 89% when there are two or more different languages sharing at least one lexical form between the synset in Galnet and the Termoteca record

    Annotation en rôles sémantiques du français en domaine spécifique

    Get PDF
    In this Natural Language Processing Ph. D. Thesis, we aim to perform semantic role labeling on French domain-specific texts. This task first disambiguates the sense of predicates in a given text and annotates its child chunks with semantic roles such as Agent, Patient or Destination. The task helps many applications in domains where annotated corpora exist, but is difficult to use otherwise. We first evaluate on the FrameNet corpus an existing method based on VerbNet, which explains why the method is domain-independant. We show that substantial improvements can be obtained. We first use syntactic information by handling the passive voice. Next, we use semantic informations by taking advantage of the selectional restrictions present in VerbNet. To apply this method to French, we first translate lexical resources. We first translate the WordNet lexical database. Next, we translate the VerbNet lexicon which is organized semantically using syntactic information. We obtain its translation, VerbeNet, by reusing two French verb lexicons (the Lexique-Grammaire and Les Verbes Français) and by manually modifying and reorganizing the resulting lexicon. Finally, once those building blocks are in place, we evaluate the feasibility of semantic role labeling of French and English in three specific domains. We study the pros and cons of using VerbNet and VerbeNet to annotate those domains before explaining our future work.Cette thèse de Traitement Automatique des Langues a pour objectif l'annotation automatique en rôles sémantiques du français en domaine spécifique. Cette tâche désambiguïse le sens des prédicats d'un texte et annote les syntagmes liés avec des rôles sémantiques tels qu'Agent, Patient ou Destination. Elle aide de nombreuses applications dans les domaines où des corpus annotés existent, mais est difficile à utiliser quand ce n'est pas le cas. Nous avons d'abord évalué sur le corpus FrameNet une méthode existante d'annotation basée uniquement sur VerbNet et donc indépendante du domaine considéré. Nous montrons que des améliorations conséquentes peuvent être obtenues à la fois d'un point de vue syntaxique avec la prise en compte de la voix passive et d'un point de vue sémantique en utilisant les restrictions de sélection indiquées dans VerbNet. Pour utiliser cette méthode en français, nous traduisons deux ressources lexicales anglaises. Nous commençons par la base de données lexicales WordNet. Nous traduisons ensuite le lexique VerbNet dans lequel les verbes sont regroupés sémantiquement grâce à leurs traits syntaxiques. La traduction, VerbeNet, a été obtenue en réutilisant deux lexiques verbaux du français (le Lexique-Grammaire et Les Verbes Français) puis en modifiant manuellement l'ensemble des informations obtenues. Enfin, une fois ces briques en place, nous évaluons la faisabilité de l'annotation en rôles sémantiques en anglais et en français dans trois domaines spécifiques. Nous évaluons quels sont les avantages et inconvénients de se baser sur VerbNet et VerbeNet pour annoter ces domaines, avant d'indiquer nos perspectives pour poursuivre ces travaux

    An ontology for human-like interaction systems

    Get PDF
    This report proposes and describes the development of a Ph.D. Thesis aimed at building an ontological knowledge model supporting Human-Like Interaction systems. The main function of such knowledge model in a human-like interaction system is to unify the representation of each concept, relating it to the appropriate terms, as well as to other concepts with which it shares semantic relations. When developing human-like interactive systems, the inclusion of an ontological module can be valuable for both supporting interaction between participants and enabling accurate cooperation of the diverse components of such an interaction system. On one hand, during human communication, the relation between cognition and messages relies in formalization of concepts, linked to terms (or words) in a language that will enable its utterance (at the expressive layer). Moreover, each participant has a unique conceptualization (ontology), different from other individual’s. Through interaction, is the intersection of both part’s conceptualization what enables communication. Therefore, for human-like interaction is crucial to have a strong conceptualization, backed by a vast net of terms linked to its concepts, and the ability of mapping it with any interlocutor’s ontology to support denotation. On the other hand, the diverse knowledge models comprising a human-like interaction system (situation model, user model, dialogue model, etc.) and its interface components (natural language processor, voice recognizer, gesture processor, etc.) will be continuously exchanging information during their operation. It is also required for them to share a solid base of references to concepts, providing consistency, completeness and quality to their processing. Besides, humans usually handle a certain range of similar concepts they can use when building messages. The subject of similarity has been and continues to be widely studied in the fields and literature of computer science, psychology and sociolinguistics. Good similarity measures are necessary for several techniques from these fields such as information retrieval, clustering, data-mining, sense disambiguation, ontology translation and automatic schema matching. Furthermore, the ontological component should also be able to perform certain inferential processes, such as the calculation of semantic similarity between concepts. The principal benefit gained from this procedure is the ability to substitute one concept for another based on a calculation of the similarity of the two, given specific circumstances. From the human’s perspective, the procedure enables referring to a given concept in cases where the interlocutor either does not know the term(s) initially applied to refer that concept, or does not know the concept itself. In the first case, the use of synonyms can do, while in the second one it will be necessary to refer the concept from some other similar (semantically-related) concepts...Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaSecretario: Inés María Galván León.- Secretario: José María Cavero Barca.- Vocal: Yolanda García Rui

    New frontiers in supervised word sense disambiguation: building multilingual resources and neural models on a large scale

    Get PDF
    Word Sense Disambiguation is a long-standing task in Natural Language Processing (NLP), lying at the core of human language understanding. While it has already been studied from many different angles over the years, ranging from knowledge based systems to semi-supervised and fully supervised models, the field seems to be slowing down in respect to other NLP tasks, e.g., part-of-speech tagging and dependencies parsing. Despite the organization of several international competitions aimed at evaluating Word Sense Disambiguation systems, the evaluation of automatic systems has been problematic mainly due to the lack of a reliable evaluation framework aiming at performing a direct quantitative confrontation. To this end we develop a unified evaluation framework and analyze the performance of various Word Sense Disambiguation systems in a fair setup. The results show that supervised systems clearly outperform knowledge-based models. Among the supervised systems, a linear classifier trained on conventional local features still proves to be a hard baseline to beat. Nonetheless, recent approaches exploiting neural networks on unlabeled corpora achieve promising results, surpassing this hard baseline in most test sets. Even though supervised systems tend to perform best in terms of accuracy, they often lose ground to more flexible knowledge-based solutions, which do not require training for every disambiguation target. To bridge this gap we adopt a different perspective and rely on sequence learning to frame the disambiguation problem: we propose and study in depth a series of end-to-end neural architectures directly tailored to the task, from bidirectional Long ShortTerm Memory to encoder-decoder models. Our extensive evaluation over standard benchmarks and in multiple languages shows that sequence learning enables more versatile all-words models that consistently lead to state-of-the-art results, even against models trained with engineered features. However, supervised systems need annotated training corpora and the few available to date are of limited size: this is mainly due to the expensive and timeconsuming process of annotating a wide variety of word senses at a reasonably high scale, i.e., the so-called knowledge acquisition bottleneck. To address this issue, we also present different strategies to acquire automatically high quality sense annotated data in multiple languages, without any manual effort. We assess the quality of the sense annotations both intrinsically and extrinsically achieving competitive results on multiple tasks

    WoNeF, an improved, expanded and evaluated automatic French translation of WordNet

    No full text
    Conference of 7th Global Wordnet Conference, GWC 2014 ; Conference Date: 25 January 2014 Through 29 January 2014; Conference Code:106369International audienceAutomatic translations of WordNet have been tried to many different target languages. JAWS is such a translation for French nouns using bilingual dictionaries and a syntactic language model. We improve its precision and coverage, complete it with translations of other parts of speech and enhance its evaluation method. The result is named WoNeF. We produce three final translations balanced between precision (up to 93%) and coverage (up to 109 447 (literal, synset) pairs)
    corecore