35,221 research outputs found
Deep Fusion of Multiple Term-Similarity Measures For Biomedical Passage Retrieval
[EN] Passage retrieval is an important stage of question answering systems. Closed domain passage retrieval, e.g. biomedical passage retrieval presents additional challenges such as specialized terminology, more complex and elaborated queries, scarcity in the amount of available data, among others. However, closed domains also offer some advantages such as the availability of specialized structured information sources, e.g. ontologies and thesauri, that could be used to improve retrieval performance. This paper presents a novel approach for biomedical passage retrieval which is able to combine different information sources using a similarity matrix fusion strategy based on convolutional neural network architecture. The method was evaluated over the standard BioASQ dataset, a dataset specialized on biomedical question answering. The results show that the method is an effective strategy for biomedical passage retrieval able to outperform other state-of-the-art methods in this domain.COLCIENCIAS, REF. Agreement #727, 2016 provided financial as well as logistical and planning support. Mindlab research group (Universidad Nacional de Colombia sede Bogota) with the cooperation of INAOE (Instituto Nacional de Astrofisica, optica y Electronica) and Universitat Politecnica de Valencia wich also provided technical support for this work. The work of Paolo Rosso was carried out in the framework of the research project PROMETEO/2019/121.Rosso-Mateus, A.; Montes Gomez, M.; Rosso, P.; González, F. (2020). Deep Fusion of Multiple Term-Similarity Measures For Biomedical Passage Retrieval. Journal of Intelligent & Fuzzy Systems. 39(2):2239-2248. https://doi.org/10.3233/JIFS-179887S22392248392Humphreys, B. L., McCray, A. T., & Lindberg, D. A. B. (1993). The Unified Medical Language System. Methods of Information in Medicine, 32(04), 281-291. doi:10.1055/s-0038-1634945Malakasiotis P. , Androutsopoulos I. , Bernadou A. , Chatzidiakou N. , Papaki E. , Constantopoulos P. , Pavlopoulos I. , Krithara A. , Almyrantis Y. and Polychronopoulos D. , et al., Challenge evaluation report 2 and roadmap, BioASQ Deliverable D 5 2014.National Institutes of Health. Pubmed baseline repository.Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M. R., … Paliouras, G. (2015). An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16(1). doi:10.1186/s12859-015-0564-6Wasim, M., Waqar, D., & Usman, D. (2017). A Survey of Datasets for Biomedical Question Answering Systems. International Journal of Advanced Computer Science and Applications, 8(7). doi:10.14569/ijacsa.2017.080767Yin, W., Schütze, H., Xiang, B., & Zhou, B. (2016). ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. Transactions of the Association for Computational Linguistics, 4, 259-272. doi:10.1162/tacl_a_0009
BERT with History Answer Embedding for Conversational Question Answering
Conversational search is an emerging topic in the information retrieval
community. One of the major challenges to multi-turn conversational search is
to model the conversation history to answer the current question. Existing
methods either prepend history turns to the current question or use complicated
attention mechanisms to model the history. We propose a conceptually simple yet
highly effective approach referred to as history answer embedding. It enables
seamless integration of conversation history into a conversational question
answering (ConvQA) model built on BERT (Bidirectional Encoder Representations
from Transformers). We first explain our view that ConvQA is a simplified but
concrete setting of conversational search, and then we provide a general
framework to solve ConvQA. We further demonstrate the effectiveness of our
approach under this framework. Finally, we analyze the impact of different
numbers of history turns under different settings to provide new insights into
conversation history modeling in ConvQA.Comment: Accepted to SIGIR 2019 as a short pape
Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension
This study considers the task of machine reading at scale (MRS) wherein,
given a question, a system first performs the information retrieval (IR) task
of finding relevant passages in a knowledge source and then carries out the
reading comprehension (RC) task of extracting an answer span from the passages.
Previous MRS studies, in which the IR component was trained without considering
answer spans, struggled to accurately find a small number of relevant passages
from a large set of passages. In this paper, we propose a simple and effective
approach that incorporates the IR and RC tasks by using supervised multi-task
learning in order that the IR component can be trained by considering answer
spans. Experimental results on the standard benchmark, answering SQuAD
questions using the full Wikipedia as the knowledge source, showed that our
model achieved state-of-the-art performance. Moreover, we thoroughly evaluated
the individual contributions of our model components with our new Japanese
dataset and SQuAD. The results showed significant improvements in the IR task
and provided a new perspective on IR for RC: it is effective to teach which
part of the passage answers the question rather than to give only a relevance
score to the whole passage.Comment: 10 pages, 6 figure. Accepted as a full paper at CIKM 201
Controlling Risk of Web Question Answering
Web question answering (QA) has become an indispensable component in modern
search systems, which can significantly improve users' search experience by
providing a direct answer to users' information need. This could be achieved by
applying machine reading comprehension (MRC) models over the retrieved passages
to extract answers with respect to the search query. With the development of
deep learning techniques, state-of-the-art MRC performances have been achieved
by recent deep methods. However, existing studies on MRC seldom address the
predictive uncertainty issue, i.e., how likely the prediction of an MRC model
is wrong, leading to uncontrollable risks in real-world Web QA applications. In
this work, we first conduct an in-depth investigation over the risk of Web QA.
We then introduce a novel risk control framework, which consists of a qualify
model for uncertainty estimation using the probe idea, and a decision model for
selectively output. For evaluation, we introduce risk-related metrics, rather
than the traditional EM and F1 in MRC, for the evaluation of risk-aware Web QA.
The empirical results over both the real-world Web QA dataset and the academic
MRC benchmark collection demonstrate the effectiveness of our approach.Comment: 42nd International ACM SIGIR Conference on Research and Development
in Information Retrieva
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
- …