2 research outputs found
Promoting understandability in consumer healt information seach
Nowadays, in the area of Consumer Health Information Retrieval, techniques
and methodologies are still far from being effective in answering complex
health queries. One main challenge comes from the varying and limited
medical knowledge background of consumers; the existing language gap be-
tween non-expert consumers and the complex medical resources confuses
them. So, returning not only topically relevant but also understandable
health information to the user is a significant and practical challenge in this
area.
In this work, the main research goal is to study ways to promote under-
standability in Consumer Health Information Retrieval. To help reaching
this goal, two research questions are issued: (i) how to bridge the existing
language gap; (ii) how to return more understandable documents. Two mod-
ules are designed, each answering one research question. In the first module,
a Medical Concept Model is proposed for use in health query processing;
this model integrates Natural Language Processing techniques into state-of-
the-art Information Retrieval. Moreover, aiming to integrate syntactic and
semantic information, word embedding models are explored as query expan-
sion resources. The second module is designed to learn understandability
from past data; a two-stage learning to rank model is proposed with rank
aggregation methods applied on single field-based ranking models.
These proposed modules are assessed on FIRE’2016 CHIS track data and
CLEF’2016-2018 eHealth IR data collections. Extensive experimental com-
parisons with the state-of-the-art baselines on the considered data collec-
tions confirmed the effectiveness of the proposed approaches: regarding un-
derstandability relevance, the improvement is 11.5%, 9.3% and 16.3% in
RBP, uRBP and uRBPgr evaluation metrics, respectively; in what concerns
to topical relevance, the improvement is 7.8%, 16.4% and 7.6% in P@10,
NDCG@10 and MAP evaluation metrics, respectively; Sumário:
Promoção da Compreensibilidade na Pesquisa de
Informação de Saúde pelo Consumidor
Atualmente as técnicas e metodologias utilizadas na área da Recuperação
de Informação em Saúde estão ainda longe de serem efetivas na resposta
às interrogações colocadas pelo consumidor. Um dos principais desafios é
o variado e limitado conhecimento médico dos consumidores; a lacuna lin-
guística entre os consumidores e os complexos recursos médicos confundem
os consumidores não especializados. Assim, a disponibilização, não apenas
de informação de saúde relevante, mas também compreensível, é um desafio
significativo e prático nesta área.
Neste trabalho, o objetivo é estudar formas de promover a compreensibili-
dade na Recuperação de Informação em Saúde. Para tal, são são levantadas
duas questões de investigação: (i) como diminuir as diferenças de linguagem
existente entre consumidores e recursos médicos; (ii) como recuperar textos
mais compreensíveis. São propostos dois módulos, cada um para respon-
der a uma das questões. No primeiro módulo é proposto um Modelo de
Conceitos Médicos para inclusão no processo da consulta de informação que
integra técnicas de Processamento de Linguagem Natural na Recuperação
de Informação. Mais ainda, com o objetivo de incorporar informação sin-
tática e semântica, são também explorados modelos de word embedding na
expansão de consultas. O segundo módulo é desenhado para aprender a com-
preensibilidade a partir de informação do passado; é proposto um modelo de
learning to rank de duas etapas, com métodos de agregação aplicados sobre
os modelos de ordenação criados com informação de campos específicos dos
documentos.
Os módulos propostos são avaliados nas coleções CHIS do FIRE’2016 e
eHealth do CLEF’2016-2018. Comparações experimentais extensivas real-
izadas com modelos atuais (baselines) confirmam a eficácia das abordagens
propostas: relativamente à relevância da compreensibilidade, obtiveram-se melhorias de 11.5%, 9.3% e 16.3 % nas medidas de avaliação RBP, uRBP e
uRBPgr, respectivamente; no que respeita à relevância dos tópicos recupera-
dos, obtiveram-se melhorias de 7.8%, 16.4% e 7.6% nas medidas de avaliação
P@10, NDCG@10 e MAP, respectivamente
Semantic Interpretation of User Queries for Question Answering on Interlinked Data
The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query. 3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the effectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words. 4. We provide two benchmarks for two different tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task. We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the effectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data