32 research outputs found

    Modular resource development and diagnostic evaluation framework for fast NLP system improvement

    Get PDF
    Natural Language Processing systems are large-scale softwares, whose development involves many man-years of work, in terms of both coding and resource development. Given a dictionary of 110k lemmas, a few hundred syntactic analysis rules, 20k ngrams matrices and other resources, what will be the impact on a syntactic analyzer of adding a new possible category to a given verb? What will be the consequences of a new syntactic rules addition? Any modification may imply, besides what was expected, unforeseeable side-effects and the complexity of the system makes it difficult to guess the overall impact of even small changes. We present here a framework designed to effectively and iteratively improve the accuracy of our linguistic analyzer LIMA by iterative refinements of its linguistic resources. These improvements are continuously assessed by evaluating the analyzer performance against a reference corpus. Our first results show that this framework is really helpful towards this goal

    Revisiting knowledge-based Semantic Role Labeling

    Get PDF
    International audienceSemantic role labeling has seen tremendous progress in the last years, both for supervised and unsupervised approaches. The knowledge-based approaches have been neglected while they have shown to bring the best results to the related word sense disambiguation task. We contribute a simple knowledge-based system with an easy to reproduce specification. We also present a novel approach to handle the passive voice in the context of semantic role labeling that reduces the error rate in F1 by 15.7%, showing that significant improvements can be brought while retaining the key advantages of the approach: a simple approach which facilitates analysis of individual errors, does not need any hand-annotated corpora and which is not domain-specific

    WoNeF : amélioration, extension et évaluation d'une traduction française automatique de WordNet

    Get PDF
    National audienceIdentifier les sens possibles des mots du vocabulaire est un problème difficile demandant un travail manuel très conséquent. Ce travail a été entrepris pour l'anglais : le résultat est la base de données lexicale WordNet, pour laquelle il n'existe encore que peu d'équivalents dans d'autres langues. Néanmoins, des traductions automatiques de WordNet vers de nombreuses langues cibles existent, notamment pour le français. JAWS est une telle traduction automatique utilisant des dictionnaires et un modèle de langage syntaxique. Nous améliorons cette traduction, la complétons avec les verbes et adjectifs de WordNet, et démontrons la validité de notre approche via une nouvelle évaluation manuelle. En plus de la version principale nommée WoNeF, nous produisons deux versions supplémentaires : une version à haute précision (93% de précision, jusqu'à 97% pour les noms), et une version à haute couverture contenant 109 447 paires (littéral, synset)

    Adapting VerbNet to French using existing resources

    Get PDF
    International audienceVerbNet is an English lexical resource for verbs that has proven useful for English NLP due to its high coverage and coherent classification. Such a resource doesn’t exist for other languages, despite some (mostly automatic and unsupervised) attempts. We show how to semi-automatically adapt VerbNet using existing resources designed for different purposes. This study focuses on French and uses two French resources: a semantic lexicon (Les Verbes Français) and a syntactic lexicon (Lexique-Grammaire)

    Semantic Similarity To Improve Question Understanding in a Virtual Patient

    Full text link
    In medicine, a communicating virtual patient or doctor allows students to train in medical diagnosis and develop skills to conduct a medical consultation. In this paper, we describe a conversational virtual standardized patient system to allow medical students to simulate a diagnosis strategy of an abdominal surgical emergency. We exploited the semantic properties captured by distributed word representations to search for similar questions in the virtual patient dialogue system. We created two dialogue systems that were evaluated on datasets collected during tests with students. The first system based on hand-crafted rules obtains 92.29%92.29\% as F1F1-score on the studied clinical case while the second system that combines rules and semantic similarity achieves 94.88%94.88\%. It represents an error reduction of 9.70%9.70\% as compared to the rules-only-based system

    Developing a French FrameNet: Methodology and First results

    Get PDF
    International audienceThe Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the first part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain

    Getting reliable answers by exploiting results from several sources of information

    Get PDF
    International audienceA question-answering system will be more convincing if it can give the user elements concerning the reliability of its propositions. In order to address this problem, we chose to take the advice of several searches. First, we search for answers in a reliable document collection, and second, on the Web. When both sources of knowledge allow the system to find common answers, we are confident with it and boost them at the first places

    Trouver des réponses dans le web et dans une collection fermée

    Get PDF
    National audienceThe task of question answering, as defined in the TREC-11 evaluation, may rely on a Web search. However, this strategy is not a sufficient one, since Web results are not certified. Our system, QALC, searches both the Web and the AQUAINT text base. This implies that the system exists in two versions, each one of them dealing with one kind of resource. Particularly, Web requests may be extremely precise, and still be successful. Relying upon both kinds of search results yields a better ranking of the answers, hence a better functioning of the QALC system.La tâche de réponse à des questions, comme elle se présente dans le cadre de l'évaluation TREC-11, peut déclencher une recherche de la réponse en question sur le Web. Mais cette stratégie, à elle seule, ne garantit pas une bonne fiabilité de la réponse. Notre système, QALC, effectue donc une double recherche, sur le Web et sur la collection de référence AQUAINT. Cela suppose d'avoir deux versions du système, adaptées à ces deux ressources documentaires. En particulier, le Web peut être interrogé avec succès en gardant la question sous une forme extrêmement précise. Le fait de s'appuyer sur des résultats communs à ces deux recherches permet de mieux classer les réponses, et donc d'améliorer la performance du système QALC

    Nouveautés de l'analyseur linguistique LIMA

    No full text
    International audience.Depuis la dernière version numérotée 2.1 en 2015, plus de 1200 modifications ont été apportées.La plupart ne sont que des améliorations à la marge, corrections de bugs ou améliorations del’infrastructure. Nous présentons dans les sections suivantes les changements les plus importants,mais commençons par résumer ci-dessous quelques autres évolutions.De nouveaux tests unitaires ont été ajoutés. Le système d’intégration continue (IC) a été amélioréavec l’utilisation de conteneurs docker sur les plateformes Semaphore, Appveyor et Travis. Nousutilisation désormais le système des release github pour distribuer les paquets générés par l’IC. Nousavons aussi amélioré la construction multiplateforme en utilisant le système de construction Ninja surl’ensemble d’entre elles. Concernant les aspects TAL, nous avons débuté la transition vers l’utilisationd’étiquettes issues du projet Universal Dependencies. Enfin, nous utilisons désormais SVMToolcomme étiqueteur morphosyntaxique par défaut
    corecore