945 research outputs found

    DCU@TRECMed 2012: Using ad-hoc baselines for domain-specific retrieval

    Get PDF
    This paper describes the first participation of DCU in the TREC Medical Records Track (TRECMed). We performed some initial experiments on the 2011 TRECMed data based on the BM25 retrieval model. Surprisingly, we found that the standard BM25 model with default parameters, performs comparable to the best automatic runs submitted to TRECMed 2011 and would have resulted in rank four out of 29 participating groups. We expected that some form of domain adaptation would increase performance. However, results on the 2011 data proved otherwise: concept-based query expansion decreased performance, and filtering and reranking by term proximity also decreased performance slightly. We submitted four runs based on the BM25 retrieval model to TRECMed 2012 using standard BM25, standard query expansion, result filtering, and concept-based query expansion. Official results for 2012 confirm that domain-specific knowledge does not increase performance compared to the BM25 baseline as applied by us

    Improving patient record search: A meta-data based approach

    Get PDF
    The International Classification of Diseases (ICD) is a type of meta-data found in many Electronic Patient Records. Research to explore the utility of these codes in medical Information Retrieval (IR) applications is new, and many areas of investigation remain, including the question of how reliable the assignment of the codes has been. This paper proposes two uses of the ICD codes in two different contexts of search: Pseudo-Relevance Judgments (PRJ) and Pseudo-Relevance Feedback (PRF). We find that our approach to evaluate the TREC challenge runs using simulated relevance judgments has a positive correlation with the TREC official results, and our proposed technique for performing PRF based on the ICD codes significantly outperforms a traditional PRF approach. The results are found to be consistent over the two years of queries from the TREC medical test collection

    Modelling the usefulness of document collections for query expansion in patient search

    Get PDF
    Dealing with the medical terminology is a challenge when searching for patients based on the relevance of their medical records towards a given query. Existing work used query expansion (QE) to extract expansion terms from different document collections to improve query representation. However, the usefulness of particular document collections for QE was not measured and taken into account during retrieval. In this work, we investigate two automatic approaches that measure and leverage the usefulness of document collections when exploiting multiple document collections to improve query representation. These two approaches are based on resource selection and learning to rank techniques, respectively. We evaluate our approaches using the TREC Medical Records track’s test collection. Our results show the potential of the proposed approaches, since they can effectively exploit 14 different document collections, including both domain-specific (e.g. MEDLINE abstracts) and generic (e.g. blogs and webpages) collections, and significantly outperform existing effective baselines, including the best systems participating at the TREC Medical Records track. Our analysis shows that the different collections are not equally useful for QE, while our two approaches can automatically weight the usefulness of expansion terms extracted from different document collections effectively.This is the author accepted manuscript. The final version is available from ACM via http://dx.doi.org/10.1145/2806416.280661

    On the Impact of Entity Linking in Microblog Real-Time Filtering

    Full text link
    Microblogging is a model of content sharing in which the temporal locality of posts with respect to important events, either of foreseeable or unforeseeable nature, makes applica- tions of real-time filtering of great practical interest. We propose the use of Entity Linking (EL) in order to improve the retrieval effectiveness, by enriching the representation of microblog posts and filtering queries. EL is the process of recognizing in an unstructured text the mention of relevant entities described in a knowledge base. EL of short pieces of text is a difficult task, but it is also a scenario in which the information EL adds to the text can have a substantial impact on the retrieval process. We implement a start-of-the-art filtering method, based on the best systems from the TREC Microblog track realtime adhoc retrieval and filtering tasks , and extend it with a Wikipedia-based EL method. Results show that the use of EL significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 - 17, 201

    Test collections for medical information retrieval evaluation

    Get PDF
    The web has rapidly become one of the main resources for medical information for many people: patients, clinicians, medical doctors, etc. Measuring the effectiveness with which information can be retrieved from web resources for these users is crucial: it brings better information to professionals for better diagnosis, treatment, patient care; and helps patients and relatives get informed on their condition. Several existing information retrieval (IR) evaluation campaigns have been developed to assess and improve medical IR methods, for example the TREC Medical Record Track [11] and TREC Genomics Track [10]. These campaigns only target certain type of users, mainly clinicians and some medical professionals: queries are mainly centered on cohorts of records describing a specific patient cases or on biomedical reports. Evaluating search effectiveness over the many heterogeneous online medical information sources now available, which are increasingly used by a diverse range of medical professionals and, very importantly, the general public, is vital to the understanding and development of medical IR. We describe the development of two benchmarks for medical IR evaluation from the Khresmoi project. The first of these has been developed using existing medical query logs for internal research within the Khresmoi project and targets both medical professionals and general public; the second has been created in the framework of a new CLEFeHealth evaluation campaign and is designed to evaluate patient search in context

    Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

    Get PDF
    When making clinical decisions, physicians often consult biomedical literatures for reference. In this case, an effective clinical decision support system, provided with a patient’s health information, should be able to generate accurate queries and return to the physicians with useful articles. Related works in the Clinical Decision Support (CDS) track of TREC 2015 demonstrated the usefulness of knowing patients’ diagnosis information for supporting more effective retrieval, but the diagnosis information is often missing in most cases. Furthermore, it is still a great challenge to perform large-scale automatic diagnosis prediction. This motivates us to propose an automatic diagnosis prediction method to enhance the retrieval in a clinical decision support system, where the evidence for the prediction is extracted from Wikipedia. Through the evaluation conducted on 2014 CDS tasks, our method reaches the best performance among all submitted runs. In the next step, graph structured evidence will be integrated to make the prediction more accurate

    Improving search over Electronic Health Records using UMLS-based query expansion through random walks

    Get PDF
    ObjectiveMost of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs.Materials and methodsThe method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets.ResultsOur experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline.DiscussionOur analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms.ConclusionsExpansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts
    corecore