Search CORE

7 research outputs found

CALOR-QUEST : un corpus d'entraînement et d'évaluation pour la compréhension automatique de textes

Author: Aloui Cindy
Bechet Frédéric
Béchet Frédéric
Charlet Delphine
Damnati Geraldine
Heinecke Johannes
Herledan Frédéric
Nasr Alexis
Publication venue: HAL CCSD
Publication date: 01/07/2019
Field of study

International audienceLa compréhension automatique de texte est une tâche faisant partie de la famille des systèmes de Question/Réponse où les questions ne sont pas à portée générale mais sont liées à un document particulier. Récemment de très grand corpus (SQuAD, MS MARCO) contenant des triplets (document, question, réponse) ont été mis à la disposition de la communauté scientifique afin de développer des méthodes supervisées à base de réseaux de neurones profonds en obtenant des résultats prometteurs. Ces méthodes sont cependant très gourmandes en données d'apprentissage, données qui n'existent pour le moment que pour la langue anglaise. Le but de cette étude est de permettre le développement de telles ressources pour d'autres langue à moindre coût en proposant une méthode générant des questions à partir d'une analyse sémantique de manière semi-automatique. La collecte de questions naturelle est réduite à un ensemble de validation/test. L'application de cette méthode sur le corpus CALOR-Frame a permis de développer la ressource CALOR-QUEST présentée dans cet article. ABSTRACT Machine reading comprehension is a task related to the Question-Answering task where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English at the moment. The purpose of this study is the development of such resources for other languages by proposing a method generating questions from a semantic frame analysis in a semi-automatic way. The collect of natural questions is reduced to a validation/test set. We applied this method on the French CALOR-Frame corpus in order to develop the CALOR-QUEST resource presented in this paper. MOTS-CLÉS : Compréhension automatique de texte, Question Réponse, Analyse en cadre séman-tique, Génération de questions

CALOR-QUEST : un corpus d'entraînement et d'évaluation pour la compréhension automatique de textes

Author: Aloui Cindy
Bechet Frédéric
Béchet Frédéric
Charlet Delphine
Damnati Geraldine
Heinecke Johannes
Herledan Frédéric
Nasr Alexis
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/07/2019
Field of study

International audienceMachine reading comprehension is a task related to the Question-Answering task where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient, however such kind of data only exists for English at the moment. The purpose of this study is the development of such resources for other languages by proposing a method generating questions from a semantic frame analysis in a semi-automatic way. The collect of natural questions is reduced to a validation/test set. We applied this method on the French CALOR-Frame corpus in order to develop the CALOR-QUEST resource presented in this paper.La compréhension automatique de texte est une tâche faisant partie de la famille des systèmes de Question/Réponse où les questions ne sont pas à portée générale mais sont liées à un document particulier. Récemment de très grand corpus (SQuAD, MS MARCO) contenant des triplets (document, question, réponse) ont été mis à la disposition de la communauté scientifique afin de développer des méthodes supervisées à base de réseaux de neurones profonds en obtenant des résultats prometteurs. Ces méthodes sont cependant très gourmandes en données d'apprentissage, données qui n'existent pour le moment que pour la langue anglaise. Le but de cette étude est de permettre le développement de telles ressources pour d'autres langue à moindre coût en proposant une méthode générant des questions à partir d'une analyse sémantique de manière semi-automatique. La collecte de questions naturelle est réduite à un ensemble de validation/test. L'application de cette méthode sur le corpus CALOR-Frame a permis de développer la ressource CALOR-QUEST présentée dans cet article

HAL AMU

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations

Author: Aloui Cindy
Béchet Frédéric
Charlet Delphine
Damnati Geraldine
Heinecke Johannes
Herledan Frédéric
Nasr Alexis
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

International audienceMachine reading comprehension is a task related to Question-Answering where questions are not generic in scope but are related to a particular document. Recently very large corpora (SQuAD, MS MARCO) containing triplets (document, question, answer) were made available to the scientific community to develop supervised methods based on deep neural networks with promising results. These methods need very large training corpus to be efficient , however such kind of data only exists for English and Chinese at the moment. The aim of this study is the development of such resources for other languages by proposing to generate in a semi-automatic way questions from the semantic Frame analysis of large corpora. The collect of natural questions is reduced to a validation/test set. We applied this method on the French CALOR-FRAME corpus to develop the CALOR-QUEST resource presented in this paper

Crossref

HAL AMU

CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations

Author: Aloui Cindy
Béchet Frédéric
Charlet Delphine
Damnati Geraldine
Heinecke Johannes
Herledan Frédéric
Nasr Alexis
Publication venue: HAL CCSD
Publication date: 04/11/2019
Field of study

HAL AMU

ISICIL: Information Semantic Integration through Communities of Intelligence onLine

Author: Abdessalem Talel
Buffa Michel
Bugeaud Florie
Comos Sébastien
Corby Olivier
Delaforge Nicolas
Ereteo Guillaume
Gandon Fabien
Giboin Alain
Grohan Patrick
Herledan Frédéric
Le Meur Valérie
Leitzelman Mylène
Leloup Benoit
Limpens Freddy
Merle Anne
Soulier Eddie
Publication venue: HAL CCSD
Publication date: 09/10/2009
Field of study

International audiencethis is a collective position paper presenting the vision, motivations and approaches of the ISICIL project. This project proposes to study and to experiment with the usage of new tools to assist tasks of corporate intelligence and technical watch. These tools rely on web 2.0 advanced interfaces (blog, wiki, social bookmarking) for interactions and on semantic web technologies for interoperability and information processing

HAL-UNICE

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1