53,077 research outputs found

    Naive Bayes Classification in The Question and Answering System

    Get PDF
    Abstract—Question and answering (QA) system is a system to answer question based on collections of unstructured text or in the form of human language. In general, QA system consists of four stages, i.e. question analysis, documents selection, passage retrieval and answer extraction. In this study we added two processes i.e. classifying documents and classifying passage. We use Naïve Bayes for classification, Dynamic Passage Partitioning for finding answer and Lucene for document selection. The experiment was done using 100 questions from 3000 documents related to the disease and the results were compared with a system that does not use the classification process. From the test results, the system works best with the use of 10 of the most relevant documents, 5 passage with the highest score and 10 answer the closest distance. Mean Reciprocal Rank (MMR) value for QA system with classification is 0.41960 which is 4.9% better than MRR value for QA system without classificatio

    GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking

    Full text link
    Retrieval-enhanced text generation, which aims to leverage passages retrieved from a large passage corpus for delivering a proper answer given the input query, has shown remarkable progress on knowledge-intensive language tasks such as open-domain question answering and knowledge-enhanced dialogue generation. However, the retrieved passages are not ideal for guiding answer generation because of the discrepancy between retrieval and generation, i.e., the candidate passages are all treated equally during the retrieval procedure without considering their potential to generate the proper answers. This discrepancy makes a passage retriever deliver a sub-optimal collection of candidate passages to generate answers. In this paper, we propose the GeneRative Knowledge Improved Passage Ranking (GripRank) approach, addressing the above challenge by distilling knowledge from a generative passage estimator (GPE) to a passage ranker, where the GPE is a generative language model used to measure how likely the candidate passages can generate the proper answer. We realize the distillation procedure by teaching the passage ranker learning to rank the passages ordered by the GPE. Furthermore, we improve the distillation quality by devising a curriculum knowledge distillation mechanism, which allows the knowledge provided by the GPE can be progressively distilled to the ranker through an easy-to-hard curriculum, enabling the passage ranker to correctly recognize the provenance of the answer from many plausible candidates. We conduct extensive experiments on four datasets across three knowledge-intensive language tasks. Experimental results show advantages over the state-of-the-art methods for both passage ranking and answer generation on the KILT benchmark.Comment: 11 pages, 4 figure

    ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

    Get PDF
    This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open retrieval Question Answering (COQA). In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate from them an answer in the language of the question. We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation. For passage retrieval, we evaluated the monolingual BM25 ranker against the ensemble of re-rankers based on multilingual pretrained language models (PLMs) and also variants of the shared task baseline, re-training it from scratch using a recently introduced contrastive loss that maintains a strong gradient signal throughout training by means of mixed negative samples. For answer generation, we focused on languageand domain-specialization by means of continued language model (LM) pretraining of existing multilingual encoders. Additionally, for both passage retrieval and answer generation, we augmented the training data provided by the task organizers with automatically generated question-answer pairs created from Wikipedia passages to mitigate the issue of data scarcity, particularly for the low-resource languages for which no training data were provided. Our results show that language- and domain-specialization as well as data augmentation help, especially for low-resource languages

    Passage Retrieval Using Answer Type Profiles in Question Answering

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Why Does This Entity Matter? Finding Support Passages for Entities in Search

    Get PDF
    In this work, we propose a method to retrieve a human-readable explanation of how a retrieved entity is connected to the information need, analogous to search snippets for document retrieval. Such an explanation is called a support passage. Our approach is based on the idea: a good support passage contains many entities relevantly related to the target entity (the entity for which a support passage is needed). We define a relevantly related entity as one which (1) occurs frequently in the vicinity of the target entity, and (2) is relevant to the query. We use the relevance of a passage (induced by the relevantly related entities) to find a good support passage for the target entity. Moreover, we want the target entity to be central to the discussion in the support passage. Hence, we explore the utility of entity salience for support passage retrieval and study the conditions under which it can help. We show that our proposed method can improve performance as compared to the current state-of-the-art for support passage retrieval on two datasets from TREC Complex Answer Retrieval
    • …
    corecore