28 research outputs found

    A hierarchical taxonomy for classifying hardness of inference tasks

    Get PDF
    International audienceExhibiting inferential capabilities is one of the major goals of many modern Natural Language Processing systems. However, if attempts have been made to define what textual inferences are, few seek to classify inference phenomena by difficulty. In this paper we propose a hierarchical taxonomy for inferences, relatively to their hardness, and with corpus annotation and system design and evaluation in mind. Indeed, a fine-grained assessment of the difficulty of a task allows us to design more appropriate systems and to evaluate them only on what they are designed to handle. Each of seven classes is described and provided with examples from different tasks like question answering, textual entailment and coreference resolution. We then test the classes of our hierarchy on the specific task of question answering. Our annotation process of the testing data at the QA4MRE 2013 evaluation campaign reveals that it is possible to quantify the contrasts in types of difficulty on datasets of the same task

    ARNLI: ARABIC NATURAL LANGUAGE INFERENCE ENTAILMENT AND CONTRADICTION DETECTION

    Get PDF
    Natural Language Inference (NLI) is a hot topic research in natural language processing, contradiction detection between sentences is a special case of NLI. This is considered a difficult NLP task which has a big influence when added as a component in many NLP applications, such as Question Answering Systems, text Summarization. Arabic Language is one of the most challenging low-resources languages in detecting contradictions due to its rich lexical, semantics ambiguity. We have created a dataset of more than 12k sentences and named ArNLI, that will be publicly available. Moreover, we have applied a new model inspired by Stanford contradiction detection proposed solutions on English language. We proposed an approach to detect contradictions between pairs of sentences in Arabic language using contradiction vector combined with language model vector as an input to machine learning model. We analyzed results of different traditional machine learning classifiers and compared their results on our created dataset (ArNLI) and on an automatic translation of both PHEME, SICK English datasets. Best results achieved using Random Forest classifier with an accuracy of 99%, 60%, 75% on PHEME, SICK and ArNLI respectively

    Réponse à des tests de compréhension.

    Get PDF
    National audienceDans cet article, nous présentons une adaptation d’un système de questions-réponses existant pour une tâche de réponse à des questions de compréhension de textes. La méthode proposée pour sélectionner les réponses correctes repose sur la reconnaissance d’implication textuelle entre les hypothèses et les textes. Les spécificités de cette méthode sont la génération d’hypothèses par réécriture syntaxique, et l’évaluation de plusieurs critères de distance,adaptés pour gérer des variantes de termes

    A book-oriented chatbot

    Get PDF
    The automatic answer to questions in natural language is an area that has been studied for many years. However, based on the existing question answering systems, the percentage of correct answers over a set of questions, generated from a dataset, we can see that the performance it is still far away from to 100%, which is many times the value achieved when the questions are tested by humans. This work addresses the idea of a book-oriented Chatbot, more precisely a question answering system directed to answer to questions in which the dataset is one or more books. This way, we intend to adopt a new system, incorporating two existent projects, the OpenBookQA and the Question-Generation. We have used two Domain Specific Datasets that were not studied in both project, that were the QA4MRE and RACE. To these we have applied the main approach: enrich them with automatic generated questions. We have run many experiments, training neural network models. This way, we intended to study the impact of those questions and obtain good accuracy results for both datasets. The obtained results suggest that having a significant representation of generated questions in a dataset, leads to a higher test accuracy results of correct answers. Becoming clear that, enrich a dataset, based on a book, with generated questions about that book, is giving to the dataset the content of the book. This dissertation presents promising results, through the datasets with automatic generated questions.A resposta automática a perguntas em língua natural é um tema estudado há largos anos. Tendo por base os sistemas existentes de resposta a perguntas, quando comparamos a percentagem de respostas correctas sobre um conjunto de perguntas, geradas a partir de um conjunto de dados, conseguimos ver que o desempenho está ainda longe de 100%, que muitas vezes é o valor alcançado quando as perguntas são testadas por humanos. Este trabalho aborda a ideia de um agente conversacional orientado para livros, mais propriamente um sistema de resposta a perguntas direccionado para responder a perguntas cujo conjunto de dados seja um ou mais livros. Deste modo, pretendemos adoptar um novo sistema, incorporando dois projectos existentes, o OpenBookQA e o Question-Generation. Utilizámos dois conjuntos de dados de domínio específico, sem terem sido ainda estudados nos dois projectos, que foram o QA4MRE e o RACE. A estes aplicámos a abordagem principal: enriquecê-los com perguntas geradas automaticamente. Corremos uma série de experiências, treinando modelos de redes neuronais. Deste modo, pretendemos estudar o impacto das perguntas geradas e obter bons resultados de precisão de respostas correctas para os dois conjuntos de dados. Os resultados obtidos sugerem que ter uma quantidade significativa de perguntas geradas num conjunto de dados, conduz a maior precisão de respostas correctas. Tornando claro que, enriquecer um dataset, sobre um livro, com perguntas geradas sobre esse mesmo livro, é dar ao dataset o contéudo do livro. Esta dissertação apresenta resultados promissores, a partir de conjuntos de dados com perguntas geradas automaticamente

    Towards Mitigating Hallucination in Large Language Models via Self-Reflection

    Full text link
    Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. However, the practical deployment still faces challenges, notably the issue of "hallucination", where models generate plausible-sounding but unfaithful or nonsensical information. This issue becomes particularly critical in the medical domain due to the uncommon professional concepts and potential social risks involved. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets. Our investigation centers on the identification and comprehension of common problematic answers, with a specific emphasis on hallucination. To tackle this challenge, we present an interactive self-reflection methodology that incorporates knowledge acquisition and answer generation. Through this feedback process, our approach steadily enhances the factuality, consistency, and entailment of the generated answers. Consequently, we harness the interactivity and multitasking ability of LLMs and produce progressively more precise and accurate answers. Experimental results on both automatic and human evaluation demonstrate the superiority of our approach in hallucination reduction compared to baselines.Comment: Accepted by the findings of EMNLP 202

    Benchmarking Machine Reading Comprehension: A Psychological Perspective

    Get PDF
    Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding. However, the conventional task design of MRC lacks explainability beyond the model interpretation, i.e., reading comprehension by a model cannot be explained in human terms. To this end, this position paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics, and summarizes it in terms of the prerequisites for benchmarking MRC. We conclude that future datasets should (i) evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and (ii) ensure substantive validity by shortcut-proof questions and explanation as a part of the task design.Comment: 21 pages, EACL 202

    Recherche d'information précise dans des sources d'information structurées et non structurées: défis, approches et hybridation.

    Get PDF
    National audienceCet article propose une synthèse d'une part sur les approches développées en questions-réponses (QR) sur du texte, en insistant plus particulièrement sur les modèles exploitant des représentations structurées des textes, et d'autre part sur les approches récentes en QR sur des bases de connaissances. Notre objectif est de montrer les problématiques communes et le rapprochement possible de ces deux types de recherche de réponses en prenant appui sur la reconnaissance des relations présentes dans les énoncés textuels et dans les bases de connaissances. Nous présentons les quelques travaux relevant de ce type d'approche afin de mettre en perspective les questions ouvertes pour aller vers des systèmes réellement hybrides ancrés sur des représentations sémantiques
    corecore