12 research outputs found

    Characterizing Question Facets for Complex Answer Retrieval

    Get PDF
    Complex answer retrieval (CAR) is the process of retrieving answers to questions that have multifaceted or nuanced answers. In this work, we present two novel approaches for CAR based on the observation that question facets can vary in utility: from structural (facets that can apply to many similar topics, such as 'History') to topical (facets that are specific to the question's topic, such as the 'Westward expansion' of the United States). We first explore a way to incorporate facet utility into ranking models during query term score combination. We then explore a general approach to reform the structure of ranking models to aid in learning of facet utility in the query-document term matching phase. When we use our techniques with a leading neural ranker on the TREC CAR dataset, our methods rank first in the 2017 TREC CAR benchmark, and yield up to 26% higher performance than the next best method.Comment: 4 pages; SIGIR 2018 Short Pape

    Humans optional? Automatic large-scale test collections for entity, passage, and entity-passage retrieval

    Get PDF
    Manually creating test collections is a time-, effort-, and cost-intensive process. This paper describes a fully automatic alternative for deriving large-scale test collections, where no human assessments are needed. The empirical experiments confirm that automatic test collection and manual assessments agree on the best performing systems. The collection includes relevance judgments for both text passages and knowledge base entities. Since test collections with relevance data for both entity and text passages are rare, this approach provides a cost-efficient way for training and evaluating ad hoc passage retrieval, entity retrieval, and entity-aware text retrieval methods

    Promoting understandability in consumer healt information seach

    Get PDF
    Nowadays, in the area of Consumer Health Information Retrieval, techniques and methodologies are still far from being effective in answering complex health queries. One main challenge comes from the varying and limited medical knowledge background of consumers; the existing language gap be- tween non-expert consumers and the complex medical resources confuses them. So, returning not only topically relevant but also understandable health information to the user is a significant and practical challenge in this area. In this work, the main research goal is to study ways to promote under- standability in Consumer Health Information Retrieval. To help reaching this goal, two research questions are issued: (i) how to bridge the existing language gap; (ii) how to return more understandable documents. Two mod- ules are designed, each answering one research question. In the first module, a Medical Concept Model is proposed for use in health query processing; this model integrates Natural Language Processing techniques into state-of- the-art Information Retrieval. Moreover, aiming to integrate syntactic and semantic information, word embedding models are explored as query expan- sion resources. The second module is designed to learn understandability from past data; a two-stage learning to rank model is proposed with rank aggregation methods applied on single field-based ranking models. These proposed modules are assessed on FIRE’2016 CHIS track data and CLEF’2016-2018 eHealth IR data collections. Extensive experimental com- parisons with the state-of-the-art baselines on the considered data collec- tions confirmed the effectiveness of the proposed approaches: regarding un- derstandability relevance, the improvement is 11.5%, 9.3% and 16.3% in RBP, uRBP and uRBPgr evaluation metrics, respectively; in what concerns to topical relevance, the improvement is 7.8%, 16.4% and 7.6% in P@10, NDCG@10 and MAP evaluation metrics, respectively; Sumário: Promoção da Compreensibilidade na Pesquisa de Informação de Saúde pelo Consumidor Atualmente as técnicas e metodologias utilizadas na área da Recuperação de Informação em Saúde estão ainda longe de serem efetivas na resposta às interrogações colocadas pelo consumidor. Um dos principais desafios é o variado e limitado conhecimento médico dos consumidores; a lacuna lin- guística entre os consumidores e os complexos recursos médicos confundem os consumidores não especializados. Assim, a disponibilização, não apenas de informação de saúde relevante, mas também compreensível, é um desafio significativo e prático nesta área. Neste trabalho, o objetivo é estudar formas de promover a compreensibili- dade na Recuperação de Informação em Saúde. Para tal, são são levantadas duas questões de investigação: (i) como diminuir as diferenças de linguagem existente entre consumidores e recursos médicos; (ii) como recuperar textos mais compreensíveis. São propostos dois módulos, cada um para respon- der a uma das questões. No primeiro módulo é proposto um Modelo de Conceitos Médicos para inclusão no processo da consulta de informação que integra técnicas de Processamento de Linguagem Natural na Recuperação de Informação. Mais ainda, com o objetivo de incorporar informação sin- tática e semântica, são também explorados modelos de word embedding na expansão de consultas. O segundo módulo é desenhado para aprender a com- preensibilidade a partir de informação do passado; é proposto um modelo de learning to rank de duas etapas, com métodos de agregação aplicados sobre os modelos de ordenação criados com informação de campos específicos dos documentos. Os módulos propostos são avaliados nas coleções CHIS do FIRE’2016 e eHealth do CLEF’2016-2018. Comparações experimentais extensivas real- izadas com modelos atuais (baselines) confirmam a eficácia das abordagens propostas: relativamente à relevância da compreensibilidade, obtiveram-se melhorias de 11.5%, 9.3% e 16.3 % nas medidas de avaliação RBP, uRBP e uRBPgr, respectivamente; no que respeita à relevância dos tópicos recupera- dos, obtiveram-se melhorias de 7.8%, 16.4% e 7.6% nas medidas de avaliação P@10, NDCG@10 e MAP, respectivamente

    Contributions to information extraction for spanish written biomedical text

    Get PDF
    285 p.Healthcare practice and clinical research produce vast amounts of digitised, unstructured data in multiple languages that are currently underexploited, despite their potential applications in improving healthcare experiences, supporting trainee education, or enabling biomedical research, for example. To automatically transform those contents into relevant, structured information, advanced Natural Language Processing (NLP) mechanisms are required. In NLP, this task is known as Information Extraction. Our work takes place within this growing field of clinical NLP for the Spanish language, as we tackle three distinct problems. First, we compare several supervised machine learning approaches to the problem of sensitive data detection and classification. Specifically, we study the different approaches and their transferability in two corpora, one synthetic and the other authentic. Second, we present and evaluate UMLSmapper, a knowledge-intensive system for biomedical term identification based on the UMLS Metathesaurus. This system recognises and codifies terms without relying on annotated data nor external Named Entity Recognition tools. Although technically naive, it performs on par with more evolved systems, and does not exhibit a considerable deviation from other approaches that rely on oracle terms. Finally, we present and exploit a new corpus of real health records manually annotated with negation and uncertainty information: NUBes. This corpus is the basis for two sets of experiments, one on cue andscope detection, and the other on assertion classification. Throughout the thesis, we apply and compare techniques of varying levels of sophistication and novelty, which reflects the rapid advancement of the field

    Leveraging Conversational User Interfaces and Digital Humans to Provide an Accessible and Supportive User Experience on an Ophthalmology Service

    Get PDF
    Designing E-Health services that are accessible, engaging, and provide valuable information to patients is an endeavor that requires research and validation with potential users. The information needs to be perceived as trustworthy and reliable, in order to promote people’s ability to make informed decisions about their health. This Master’s thesis work focused on understanding the potential of conversational user interfaces (CUIs) featuring digital humans (DHs) as communication agents to provide healthcare-related information to users. The case study underlying the research was proposed by Roche: the company wanted to create an informational ophthalmology website featuring a digital human to substitute the traditional text-based website. The main goal of this work was to understand whether CUIs and DHs can provide a higher level of accessibility and engagement for users, with a special focus on people starting to live with low vision (potential ophthalmology patients). Managing to address these aspects would allow providing a better user experience for people visiting the website. Since digital humans are not yet extensively adopted in the healthcare domain, few design guidelines are available. The work employed a human-centered design approach, to gather requirements and feedback from users, and led to defining six guidelines and an extensive set of observations about user experience and accessibility. These guidelines are: ensure that the digital human is as realistic as possible; create a clear and easy to follow conversation; present options simply and allow flexibility in choice methods; provide a text version of the content; ensure that easy and self-explanatory navigation; ensure compatibility with assistive technologies and provide flexibility, personalization and integration. The user research was divided into two phases. First, an exploratory research session was conducted, where ten participants were recruited to investigate the needs and expectations of people living with eye conditions towards an informative service and their first impressions of DHs. This session employed the semi-structured interview methodology, and the results informed the further development of the service. When the first proof of concept prototype version of the website was built, an evaluative research phase with eighteen participants was conducted. This session was conducted using the participant observation methodology paired with semi-structured elicitation interviews. Afterwards focus group sessions were organized to have the participants further discuss their experience. The user-based research was paired with expert evaluation using the cognitive walkthrough methodology and a simplified WCAG 2.1 accessibility assessment. Combining the two approaches gave a good overview of the merits and issues of the approach. The results of the research allowed building a good understanding of the positive and negative aspects of using a digital human as an agent in a conversational user interface. Users generally appreciated the concept: they found it engaging, trustworthy and easy to use. However, there are some aspects that could not be addressed during this research, and which need further understanding. The primary areas that need to be addressed are guidance, navigation, and error management. Nonetheless, the positive feedback gathered from the participants of the evaluation sessions proves that it is worth investing in the research and development of this relatively new kind of service. In fact, the results of the work show that having a digital human as agent for a conversation-based informative service in healthcare has strong potential, in terms of both accessibility and engagement

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Preface

    Get PDF
    corecore