35,221 research outputs found

    Deep Fusion of Multiple Term-Similarity Measures For Biomedical Passage Retrieval

    Full text link
    [EN] Passage retrieval is an important stage of question answering systems. Closed domain passage retrieval, e.g. biomedical passage retrieval presents additional challenges such as specialized terminology, more complex and elaborated queries, scarcity in the amount of available data, among others. However, closed domains also offer some advantages such as the availability of specialized structured information sources, e.g. ontologies and thesauri, that could be used to improve retrieval performance. This paper presents a novel approach for biomedical passage retrieval which is able to combine different information sources using a similarity matrix fusion strategy based on convolutional neural network architecture. The method was evaluated over the standard BioASQ dataset, a dataset specialized on biomedical question answering. The results show that the method is an effective strategy for biomedical passage retrieval able to outperform other state-of-the-art methods in this domain.COLCIENCIAS, REF. Agreement #727, 2016 provided financial as well as logistical and planning support. Mindlab research group (Universidad Nacional de Colombia sede Bogota) with the cooperation of INAOE (Instituto Nacional de Astrofisica, optica y Electronica) and Universitat Politecnica de Valencia wich also provided technical support for this work. The work of Paolo Rosso was carried out in the framework of the research project PROMETEO/2019/121.Rosso-Mateus, A.; Montes Gomez, M.; Rosso, P.; González, F. (2020). Deep Fusion of Multiple Term-Similarity Measures For Biomedical Passage Retrieval. Journal of Intelligent & Fuzzy Systems. 39(2):2239-2248. https://doi.org/10.3233/JIFS-179887S22392248392Humphreys, B. L., McCray, A. T., & Lindberg, D. A. B. (1993). The Unified Medical Language System. Methods of Information in Medicine, 32(04), 281-291. doi:10.1055/s-0038-1634945Malakasiotis P. , Androutsopoulos I. , Bernadou A. , Chatzidiakou N. , Papaki E. , Constantopoulos P. , Pavlopoulos I. , Krithara A. , Almyrantis Y. and Polychronopoulos D. , et al., Challenge evaluation report 2 and roadmap, BioASQ Deliverable D 5 2014.National Institutes of Health. Pubmed baseline repository.Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M. R., … Paliouras, G. (2015). An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16(1). doi:10.1186/s12859-015-0564-6Wasim, M., Waqar, D., & Usman, D. (2017). A Survey of Datasets for Biomedical Question Answering Systems. International Journal of Advanced Computer Science and Applications, 8(7). doi:10.14569/ijacsa.2017.080767Yin, W., Schütze, H., Xiang, B., & Zhou, B. (2016). ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. Transactions of the Association for Computational Linguistics, 4, 259-272. doi:10.1162/tacl_a_0009

    BERT with History Answer Embedding for Conversational Question Answering

    Full text link
    Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to answer the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We propose a conceptually simple yet highly effective approach referred to as history answer embedding. It enables seamless integration of conversation history into a conversational question answering (ConvQA) model built on BERT (Bidirectional Encoder Representations from Transformers). We first explain our view that ConvQA is a simplified but concrete setting of conversational search, and then we provide a general framework to solve ConvQA. We further demonstrate the effectiveness of our approach under this framework. Finally, we analyze the impact of different numbers of history turns under different settings to provide new insights into conversation history modeling in ConvQA.Comment: Accepted to SIGIR 2019 as a short pape

    Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension

    Full text link
    This study considers the task of machine reading at scale (MRS) wherein, given a question, a system first performs the information retrieval (IR) task of finding relevant passages in a knowledge source and then carries out the reading comprehension (RC) task of extracting an answer span from the passages. Previous MRS studies, in which the IR component was trained without considering answer spans, struggled to accurately find a small number of relevant passages from a large set of passages. In this paper, we propose a simple and effective approach that incorporates the IR and RC tasks by using supervised multi-task learning in order that the IR component can be trained by considering answer spans. Experimental results on the standard benchmark, answering SQuAD questions using the full Wikipedia as the knowledge source, showed that our model achieved state-of-the-art performance. Moreover, we thoroughly evaluated the individual contributions of our model components with our new Japanese dataset and SQuAD. The results showed significant improvements in the IR task and provided a new perspective on IR for RC: it is effective to teach which part of the passage answers the question rather than to give only a relevance score to the whole passage.Comment: 10 pages, 6 figure. Accepted as a full paper at CIKM 201

    Controlling Risk of Web Question Answering

    Full text link
    Web question answering (QA) has become an indispensable component in modern search systems, which can significantly improve users' search experience by providing a direct answer to users' information need. This could be achieved by applying machine reading comprehension (MRC) models over the retrieved passages to extract answers with respect to the search query. With the development of deep learning techniques, state-of-the-art MRC performances have been achieved by recent deep methods. However, existing studies on MRC seldom address the predictive uncertainty issue, i.e., how likely the prediction of an MRC model is wrong, leading to uncontrollable risks in real-world Web QA applications. In this work, we first conduct an in-depth investigation over the risk of Web QA. We then introduce a novel risk control framework, which consists of a qualify model for uncertainty estimation using the probe idea, and a decision model for selectively output. For evaluation, we introduce risk-related metrics, rather than the traditional EM and F1 in MRC, for the evaluation of risk-aware Web QA. The empirical results over both the real-world Web QA dataset and the academic MRC benchmark collection demonstrate the effectiveness of our approach.Comment: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieva

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
    corecore