9 research outputs found
Improving Personalized Consumer Health Search
CLEF 2018 eHealth Consumer Health Search task aims to investigate the effectiveness of the information retrieval systems in providing health information to common health consumers. Compared to previous years, this year’s task includes five subtasks and adopts new data corpus and set of queries. This paper presents the work of University of Evora participating in two subtasks: IRtask-1 and IRtask-2. It explores the use of learning to rank techniques as well as query expan-
sion approaches. A number of field based features are used for training a learning to rank model and a medical concept model proposed in previous work is re-employed for this year’s new task. Word vectors and
UMLS are used as query expansion sources. Four runs were submitted to each task accordingly
Overview of the CLEF 2018 Consumer Health Search Task
This paper details the collection, systems and evaluation
methods used in the CLEF 2018 eHealth Evaluation Lab, Consumer Health Search (CHS) task (Task 3). This task investigates the effectiveness of search engines in providing access to medical information present on the Web for people that have no or little medical knowledge. The task aims to foster advances in the development of search technologies for Consumer Health Search by providing resources and evaluation methods to test and validate search systems. Built upon the the 2013-17 series of CLEF eHealth Information Retrieval tasks, the 2018 task considers
both mono- and multilingual retrieval, embracing the Text REtrieval Conference (TREC) -style evaluation process with a shared collection of documents and queries, the contribution of runs from participants and the subsequent formation of relevance assessments and evaluation of the participants submissions.
For this year, the CHS task uses a new Web corpus and a new set of queries compared to the previous years. The new corpus consists of Web pages acquired from the CommonCrawl and the new set of queries consists of 50 queries issued by the general public to the Health on the Net (HON) search services. We then manually translated the 50 queries to
French, German, and Czech; and obtained English query variations of the 50 original queries.
A total of 7 teams from 7 different countries participated in the 2018 CHS task: CUNI (Czech Republic), IMS Unipd (Italy), MIRACL (Tunisia), QUT (Australia), SINAI (Spain), UB-Botswana (Botswana), and UEvora (Portugal)
Overview of ImageCLEF 2018: Challenges, Datasets and Evaluation
This paper presents an overview of the ImageCLEF 2018 evaluation campaign, an event that was organized as part of the CLEF (Conference and Labs of the Evaluation Forum) Labs 2018. ImageCLEF is an ongoing initiative (it started in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval with the aim of providing information access to collections of images in various usage scenarios and domains. In 2018, the 16th edition of ImageCLEF ran three main tasks and a pilot task: (1) a caption prediction task that aims at predicting the caption of a figure from the biomedical literature based only on the figure image; (2) a tuberculosis task that aims at detecting the tuberculosis type, severity and drug resistance from CT (Computed Tomography) volumes of the lung; (3) a LifeLog task (videos, images and other sources) about daily activities understanding and moment retrieval, and (4) a pilot task on visual question answering where systems are tasked with answering medical questions. The strong participation, with over 100 research groups registering and 31 submitting results for the tasks, shows an increasing interest in this benchmarking campaign
Overview of the CLEF 2018 Consumer Health Search Task
This paper details the collection, systems and evaluation
methods used in the CLEF 2018 eHealth Evaluation Lab, Consumer
Health Search (CHS) task (Task 3). This task investigates the effectiveness of search engines in providing access to medical information present
on the Web for people that have no or little medical knowledge. The task
aims to foster advances in the development of search technologies for
Consumer Health Search by providing resources and evaluation methods
to test and validate search systems. Built upon the the 2013-17 series
of CLEF eHealth Information Retrieval tasks, the 2018 task considers
both mono- and multilingual retrieval, embracing the Text REtrieval
Conference (TREC) -style evaluation process with a shared collection of
documents and queries, the contribution of runs from participants and
the subsequent formation of relevance assessments and evaluation of the
participants submissions.
For this year, the CHS task uses a new Web corpus and a new set of
queries compared to the previous years. The new corpus consists of Web
pages acquired from the CommonCrawl and the new set of queries consists of 50 queries issued by the general public to the Health on the Net
(HON) search services. We then manually translated the 50 queries to
French, German, and Czech; and obtained English query variations of
the 50 original queries.
A total of 7 teams from 7 different countries participated in the 2018 CHS
task: CUNI (Czech Republic), IMS Unipd (Italy), MIRACL (Tunisia),
QUT (Australia), SINAI (Spain), UB-Botswana (Botswana), and UEvora
(Portugal)
Overview of the CLEF eHealth Evaluation Lab 2018
In this paper, we provide an overview of the sixth annual edition of the CLEF eHealth evaluation lab. CLEF eHealth 2018 continues
our evaluation resource building efforts around the easing and support of
patients, their next-of-kins, clinical staff, and health scientists in understanding, accessing, and authoring eHealth information in a multilingual
setting. This year’s lab offered three tasks: Task 1 on multilingual information extraction to extend from last year’s task on French and English
corpora to French, Hungarian, and Italian; Task 2 on technologically
assisted reviews in empirical medicine building on last year’s pilot task in English; and Task 3 on Consumer Health Search (CHS) in mono- and
multilingual settings that builds on the 2013–17 Information Retrieval
tasks. In total 28 teams took part in these tasks (14 in Task 1, 7 in Task
2 and 7 in Task 3). Herein, we describe the resources created for these
tasks, outline our evaluation methodology adopted and provide a brief
summary of participants of this year’s challenges and results obtained.
As in previous years, the organizers have made data and tools associated
with the lab tasks available for future research and development
Promoting understandability in consumer healt information seach
Nowadays, in the area of Consumer Health Information Retrieval, techniques
and methodologies are still far from being effective in answering complex
health queries. One main challenge comes from the varying and limited
medical knowledge background of consumers; the existing language gap be-
tween non-expert consumers and the complex medical resources confuses
them. So, returning not only topically relevant but also understandable
health information to the user is a significant and practical challenge in this
area.
In this work, the main research goal is to study ways to promote under-
standability in Consumer Health Information Retrieval. To help reaching
this goal, two research questions are issued: (i) how to bridge the existing
language gap; (ii) how to return more understandable documents. Two mod-
ules are designed, each answering one research question. In the first module,
a Medical Concept Model is proposed for use in health query processing;
this model integrates Natural Language Processing techniques into state-of-
the-art Information Retrieval. Moreover, aiming to integrate syntactic and
semantic information, word embedding models are explored as query expan-
sion resources. The second module is designed to learn understandability
from past data; a two-stage learning to rank model is proposed with rank
aggregation methods applied on single field-based ranking models.
These proposed modules are assessed on FIRE’2016 CHIS track data and
CLEF’2016-2018 eHealth IR data collections. Extensive experimental com-
parisons with the state-of-the-art baselines on the considered data collec-
tions confirmed the effectiveness of the proposed approaches: regarding un-
derstandability relevance, the improvement is 11.5%, 9.3% and 16.3% in
RBP, uRBP and uRBPgr evaluation metrics, respectively; in what concerns
to topical relevance, the improvement is 7.8%, 16.4% and 7.6% in P@10,
NDCG@10 and MAP evaluation metrics, respectively; Sumário:
Promoção da Compreensibilidade na Pesquisa de
Informação de Saúde pelo Consumidor
Atualmente as técnicas e metodologias utilizadas na área da Recuperação
de Informação em Saúde estão ainda longe de serem efetivas na resposta
às interrogações colocadas pelo consumidor. Um dos principais desafios é
o variado e limitado conhecimento médico dos consumidores; a lacuna lin-
guística entre os consumidores e os complexos recursos médicos confundem
os consumidores não especializados. Assim, a disponibilização, não apenas
de informação de saúde relevante, mas também compreensível, é um desafio
significativo e prático nesta área.
Neste trabalho, o objetivo é estudar formas de promover a compreensibili-
dade na Recuperação de Informação em Saúde. Para tal, são são levantadas
duas questões de investigação: (i) como diminuir as diferenças de linguagem
existente entre consumidores e recursos médicos; (ii) como recuperar textos
mais compreensíveis. São propostos dois módulos, cada um para respon-
der a uma das questões. No primeiro módulo é proposto um Modelo de
Conceitos Médicos para inclusão no processo da consulta de informação que
integra técnicas de Processamento de Linguagem Natural na Recuperação
de Informação. Mais ainda, com o objetivo de incorporar informação sin-
tática e semântica, são também explorados modelos de word embedding na
expansão de consultas. O segundo módulo é desenhado para aprender a com-
preensibilidade a partir de informação do passado; é proposto um modelo de
learning to rank de duas etapas, com métodos de agregação aplicados sobre
os modelos de ordenação criados com informação de campos específicos dos
documentos.
Os módulos propostos são avaliados nas coleções CHIS do FIRE’2016 e
eHealth do CLEF’2016-2018. Comparações experimentais extensivas real-
izadas com modelos atuais (baselines) confirmam a eficácia das abordagens
propostas: relativamente à relevância da compreensibilidade, obtiveram-se melhorias de 11.5%, 9.3% e 16.3 % nas medidas de avaliação RBP, uRBP e
uRBPgr, respectivamente; no que respeita à relevância dos tópicos recupera-
dos, obtiveram-se melhorias de 7.8%, 16.4% e 7.6% nas medidas de avaliação
P@10, NDCG@10 e MAP, respectivamente
Domain-specific language models for multi-label classification of medical text
Recent advancements in machine learning-based medical text multi-label classifications can be used to enhance the understanding of the human body and aid the need for patient care. This research considers predicting medical codes from electronic health records (EHRs) as multi-label problems, where the number of labels ranged from 15 to 923. It is motivated by the advancements in domain-specific language models to better understand and represent electronic health records and improve the predictive accuracy of medical codes.
The thesis presents an extensive empirical study of language models for binary and multi-label medical text classifications. Domain-specific multi-sourced fastText pre-trained embeddings are introduced. Experimental results show considerable improvements to predictive accuracy when such embeddings are used to represent medical text. It is shown that using domain-specific transformer models outperforms results for multi-label problems with fixed sequence length. If processing time is not an issue for a long medical text, then TransformerXL will be the best model to use. Experimental results show significant improvements over other models, including state-of-the-art results, when TransformerXL is used for down-streaming tasks such as predicting medical codes.
The thesis considers concatenated language models to handle long medical documents and text data from multiple sources of EHRs. Experimental results show improvements in overall micro and macro F1 scores, and such improvements are achieved with fewer resources. In addition, it is shown that concatenated domain-specific transformers improve F1 scores of infrequent labels across several multi-label problems, especially with long-tail labels