Search CORE

32 research outputs found

Modular resource development and diagnostic evaluation framework for fast NLP system improvement

Author: de Chalendar Gaël
Nouvel Damien
Publication venue: HAL CCSD
Publication date: 31/05/2009
Field of study

Natural Language Processing systems are large-scale softwares, whose development involves many man-years of work, in terms of both coding and resource development. Given a dictionary of 110k lemmas, a few hundred syntactic analysis rules, 20k ngrams matrices and other resources, what will be the impact on a syntactic analyzer of adding a new possible category to a given verb? What will be the consequences of a new syntactic rules addition? Any modification may imply, besides what was expected, unforeseeable side-effects and the complexity of the system makes it difficult to guess the overall impact of even small changes. We present here a framework designed to effectively and iteratively improve the accuracy of our linguistic analyzer LIMA by iterative refinements of its linguistic resources. These improvements are continuously assessed by evaluating the analyzer performance against a reference corpus. Our first results show that this framework is really helpful towards this goal

HAL Université de Tours

HAL-CEA

Revisiting knowledge-based Semantic Role Labeling

Author: De Chalendar Gaël
Pradet Quentin
Pujol Guilhem
Publication venue: HAL CCSD
Publication date: 07/12/2013
Field of study

International audienceSemantic role labeling has seen tremendous progress in the last years, both for supervised and unsupervised approaches. The knowledge-based approaches have been neglected while they have shown to bring the best results to the related word sense disambiguation task. We contribute a simple knowledge-based system with an easy to reproduce specification. We also present a novel approach to handle the passive voice in the context of semantic role labeling that reduces the error rate in F1 by 15.7%, showing that significant improvements can be brought while retaining the key advantages of the approach: a simple approach which facilitates analysis of individual errors, does not need any hand-annotated corpora and which is not domain-specific

HAL-CEA

WoNeF : amélioration, extension et évaluation d'une traduction française automatique de WordNet

Author: Baguenier Desormeaux Jeanne
Danlos Laurence
De Chalendar Gaël
Pradet Quentin
Publication venue: HAL CCSD
Publication date: 17/06/2013
Field of study

National audienceIdentiﬁer les sens possibles des mots du vocabulaire est un problème difﬁcile demandant un travail manuel très conséquent. Ce travail a été entrepris pour l'anglais : le résultat est la base de données lexicale WordNet, pour laquelle il n'existe encore que peu d'équivalents dans d'autres langues. Néanmoins, des traductions automatiques de WordNet vers de nombreuses langues cibles existent, notamment pour le français. JAWS est une telle traduction automatique utilisant des dictionnaires et un modèle de langage syntaxique. Nous améliorons cette traduction, la complétons avec les verbes et adjectifs de WordNet, et démontrons la validité de notre approche via une nouvelle évaluation manuelle. En plus de la version principale nommée WoNeF, nous produisons deux versions supplémentaires : une version à haute précision (93% de précision, jusqu'à 97% pour les noms), et une version à haute couverture contenant 109 447 paires (littéral, synset)

INRIA a CCSD electronic archive server

HAL-CEA

Hal-Diderot

Adapting VerbNet to French using existing resources

Author: Danlos Laurence
De Chalendar Gaël
Pradet Quentin
Publication venue: HAL CCSD
Publication date: 28/05/2014
Field of study

International audienceVerbNet is an English lexical resource for verbs that has proven useful for English NLP due to its high coverage and coherent classification. Such a resource doesn’t exist for other languages, despite some (mostly automatic and unsupervised) attempts. We show how to semi-automatically adapt VerbNet using existing resources designed for different purposes. This study focuses on French and uses two French resources: a semantic lexicon (Les Verbes Français) and a syntactic lexicon (Lexique-Grammaire)

INRIA a CCSD electronic archive server

HAL-CEA

Hal-Diderot

Semantic Similarity To Improve Question Understanding in a Virtual Patient

Author: Behnamou Dan
Blanié Antonia
Brouquet Antoine
de Chalendar Gaël
Laleye Fréjus A. A.
Publication venue
Publication date: 16/12/2019
Field of study

In medicine, a communicating virtual patient or doctor allows students to train in medical diagnosis and develop skills to conduct a medical consultation. In this paper, we describe a conversational virtual standardized patient system to allow medical students to simulate a diagnosis strategy of an abdominal surgical emergency. We exploited the semantic properties captured by distributed word representations to search for similar questions in the virtual patient dialogue system. We created two dialogue systems that were evaluated on datasets collected during tests with students. The first system based on hand-crafted rules obtains

92.29\%

F1

-score on the studied clinical case while the second system that combines rules and semantic similarity achieves

94.88\%

. It represents an error reduction of

9.70\%

as compared to the rules-only-based system

arXiv.org e-Print Archive

Crossref

HAL-CEA

Developing a French FrameNet: Methodology and First results

Author: Amsili Pascal
Barque Lucie
Benamara Farah
Candito Marie
De Chalendar Gaël
Djemaa Marianne
Haas Pauline
Huyghe Richard
Mathieu Yvette Yannick
Muller Philippe
Sagot Benoît
Vieu Laure
Publication venue: HAL CCSD
Publication date: 01/05/2014
Field of study

International audienceThe Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the ﬁrst part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Open Archive Toulouse Archive Ouverte

HAL-CEA

Hal-Diderot

Getting reliable answers by exploiting results from several sources of information

Author: Berthelin Jean-Baptiste
De Chalendar Gaël
Elkateb-Gara Faïza
Ferret Olivier
Grau Brigitte
Hurault-Plantet Martine
Illouz Gabriel
Monceaux Laura
Robba Isabelle
Vilnat Anne
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

International audienceA question-answering system will be more convincing if it can give the user elements concerning the reliability of its propositions. In order to address this problem, we chose to take the advice of several searches. First, we search for answers in a reliable document collection, and second, on the Web. When both sources of knowledge allow the system to find common answers, we are confident with it and boost them at the first places

Trouver des réponses dans le web et dans une collection fermée

Author: Berthelin Jean-Baptiste
De Chalendar Gaël
Elkateb-Gara Faïza
Ferret Olivier
Grau Brigitte
Hurault-Plantet Martine
Illouz Gabriel
Monceaux Laura
Robba Isabelle
Vilnat Anne
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

National audienceThe task of question answering, as defined in the TREC-11 evaluation, may rely on a Web search. However, this strategy is not a sufficient one, since Web results are not certified. Our system, QALC, searches both the Web and the AQUAINT text base. This implies that the system exists in two versions, each one of them dealing with one kind of resource. Particularly, Web requests may be extremely precise, and still be successful. Relying upon both kinds of search results yields a better ranking of the answers, hence a better functioning of the QALC system.La tâche de réponse à des questions, comme elle se présente dans le cadre de l'évaluation TREC-11, peut déclencher une recherche de la réponse en question sur le Web. Mais cette stratégie, à elle seule, ne garantit pas une bonne fiabilité de la réponse. Notre système, QALC, effectue donc une double recherche, sur le Web et sur la collection de référence AQUAINT. Cela suppose d'avoir deux versions du système, adaptées à ces deux ressources documentaires. En particulier, le Web peut être interrogé avec succès en gardant la question sous une forme extrêmement précise. Le fait de s'appuyer sur des résultats communs à ces deux recherches permet de mieux classer les réponses, et donc d'améliorer la performance du système QALC

Hal-Diderot

Nouveautés de l'analyseur linguistique LIMA

Author: de Chalendar Gaël
Publication venue: HAL CCSD
Publication date: 14/05/2018
Field of study

International audience.Depuis la dernière version numérotée 2.1 en 2015, plus de 1200 modifications ont été apportées.La plupart ne sont que des améliorations à la marge, corrections de bugs ou améliorations del’infrastructure. Nous présentons dans les sections suivantes les changements les plus importants,mais commençons par résumer ci-dessous quelques autres évolutions.De nouveaux tests unitaires ont été ajoutés. Le système d’intégration continue (IC) a été amélioréavec l’utilisation de conteneurs docker sur les plateformes Semaphore, Appveyor et Travis. Nousutilisation désormais le système des release github pour distribuer les paquets générés par l’IC. Nousavons aussi amélioré la construction multiplateforme en utilisant le système de construction Ninja surl’ensemble d’entre elles. Concernant les aspects TAL, nous avons débuté la transition vers l’utilisationd’étiquettes issues du projet Universal Dependencies. Enfin, nous utilisons désormais SVMToolcomme étiqueteur morphosyntaxique par défaut

HAL-CEA