12 research outputs found

    ShARe/CLEF eHealth evaluation lab 2014, task 3: user-centred health information retrieval

    Get PDF
    This paper presents the results of task 3 of the ShARe/CLEF eHealth Evaluation Lab 2014. This evaluation lab focuses on improving access to medical information on the web. The task objective was to investigate the effect of using additional information such as a related discharge summary and external resources such as medical ontologies on the IR effectiveness, in a monolingual and in a multilingual context. The participants were allowed to submit up to seven runs for each language, one mandatory run using no additional information or external resources, and three each using or not using discharge summaries

    Recherche d'information médicale pour le patient Impact de ressources terminologiques

    Get PDF
    National audienceABSTRACT. The right of patients to access their clinical health record is granted by the code of SantĂ© Publique. Yet, this content remain difficult to understand. We propose an experience, in which we use queries defined by patients in order to find relevant documents. We utilise the Indri search engine, based on statistical language modeling and semantic resources. We stress the point related to the terminological variation (e.g. synonyms, abbreviations) to make the link between expert and patient languages. Various combinations of resources and Indri settings are explored, mostly based on query expansion. Our system shows up to 0.7660 P@10 and up to 0.6793 [email protected]ÉSUMÉ. Le droit d'accĂšs au dossier clinique par les patients est inscrit dans le code de SantĂ© Publique. Cependant, ce contenu reste difficile Ă  comprendre. Nous proposons une expĂ©rience, oĂč les requĂȘtes des patients sont utilisĂ©es pour retrouver les documents pertinents. Nous util-isons le moteur de recherche Indri, basĂ© sur le modĂšle statistique de la langue, et des ressources sĂ©mantiques. L'accent est mis sur la variation terminologique (e.g. synonymes, abrĂ©viations) pour faire le lien entre la langue des experts et des patients. DiffĂ©rentes combinaisons de ressources et du paramĂ©trage de Indri sont testĂ©es, essentiellement Ă  travers l'expansion des requĂȘtes. Notre systĂšme montre jusqu'Ă  0,7660 de P@10 et 0,6793 de NDCG@10

    An enhanced concept based approach medical information retrieval to address readability, vocabulary and presentation issues

    Get PDF
    Querying of health information retrieval for health advice has now become a general and notable task performed by individuals on the Internet. However, the failure of the existing approaches to integrate program modules that would address the information needs of all categories of end-users remains. This study focused on proposing an improved framework and designing an enhanced concept based approach (ECBA) for medical information retrieval that would better address readability, vocabulary mismatched and presentation issues by generating medical discharge documents and medical search queries results in both medical expert and layman’s forms. Three special program modules were designed and integrated in the enhanced concept based approach namely: medical terms control module, vocabulary controlled module and readability module to specifically address the information needs of both medical experts and laymen end-users. Eight benched marked datasets namely: Medline, UMLS, MeSH, Metamap, Metathesaurus, Diagnosia 7, Khresmoi Project 6 and Genetic Home Reference were used in validating the systems performance. Additionally, the ECBA was compared using three existing approaches such as concept based approach (CBA), query likelihood model (QLM) and latent semantic indexing (LSI). The evaluation was conducted using the performance and statistical metrics: P@40, NDCG@40, MAP, Analysis of Variance (ANOVA) and Turkey HSD Tests. The outcome of the final experimental results obtained shows that, the ECBA consistently obtained above 93% accuracy rate results on Medline, UMLS and MeSH Datasets, 92% on Metamap, Metathesaurus and Diagnosia 7 datasets and 91% on Khresmoi Project 6 and Genetic Home Reference datasets. Also, the statistical analysis performance results obtained by each of the four approaches: ECBA, CBA, QLM and LSI shows that, there is a significant difference among their Mean Scores, hence, the null hypothesis of no significant difference was rejected

    Knowledge-driven entity recognition and disambiguation in biomedical text

    Get PDF
    Entity recognition and disambiguation (ERD) for the biomedical domain are notoriously difficult problems due to the variety of entities and their often long names in many variations. Existing works focus heavily on the molecular level in two ways. First, they target scientific literature as the input text genre. Second, they target single, highly specialized entity types such as chemicals, genes, and proteins. However, a wealth of biomedical information is also buried in the vast universe of Web content. In order to fully utilize all the information available, there is a need to tap into Web content as an additional input. Moreover, there is a need to cater for other entity types such as symptoms and risk factors since Web content focuses on consumer health. The goal of this thesis is to investigate ERD methods that are applicable to all entity types in scientific literature as well as Web content. In addition, we focus on under-explored aspects of the biomedical ERD problems -- scalability, long noun phrases, and out-of-knowledge base (OOKB) entities. This thesis makes four main contributions, all of which leverage knowledge in UMLS (Unified Medical Language System), the largest and most authoritative knowledge base (KB) of the biomedical domain. The first contribution is a fast dictionary lookup method for entity recognition that maximizes throughput while balancing the loss of precision and recall. The second contribution is a semantic type classification method targeting common words in long noun phrases. We develop a custom set of semantic types to capture word usages; besides biomedical usage, these types also cope with non-biomedical usage and the case of generic, non-informative usage. The third contribution is a fast heuristics method for entity disambiguation in MEDLINE abstracts, again maximizing throughput but this time maintaining accuracy. The fourth contribution is a corpus-driven entity disambiguation method that addresses OOKB entities. The method first captures the entities expressed in a corpus as latent representations that comprise in-KB and OOKB entities alike before performing entity disambiguation.Die Erkennung und Disambiguierung von EntitĂ€ten fĂŒr den biomedizinischen Bereich stellen, wegen der vielfĂ€ltigen Arten von biomedizinischen EntitĂ€ten sowie deren oft langen und variantenreichen Namen, große Herausforderungen dar. Vorhergehende Arbeiten konzentrieren sich in zweierlei Hinsicht fast ausschließlich auf molekulare EntitĂ€ten. Erstens fokussieren sie sich auf wissenschaftliche Publikationen als Genre der Eingabetexte. Zweitens fokussieren sie sich auf einzelne, sehr spezialisierte EntitĂ€tstypen wie Chemikalien, Gene und Proteine. Allerdings bietet das Internet neben diesen Quellen eine Vielzahl an Inhalten biomedizinischen Wissens, das vernachlĂ€ssigt wird. Um alle verfĂŒgbaren Informationen auszunutzen besteht der Bedarf weitere Internet-Inhalte als zusĂ€tzliche Quellen zu erschließen. Außerdem ist es auch erforderlich andere EntitĂ€tstypen wie Symptome und Risikofaktoren in Betracht zu ziehen, da diese fĂŒr zahlreiche Inhalte im Internet, wie zum Beispiel Verbraucherinformationen im Gesundheitssektor, relevant sind. Das Ziel dieser Dissertation ist es, Methoden zur Erkennung und Disambiguierung von EntitĂ€ten zu erforschen, die alle EntitĂ€tstypen in Betracht ziehen und sowohl auf wissenschaftliche Publikationen als auch auf andere Internet-Inhalte anwendbar sind. DarĂŒber hinaus setzen wir Schwerpunkte auf oft vernachlĂ€ssigte Aspekte der biomedizinischen Erkennung und Disambiguierung von EntitĂ€ten, nĂ€mlich Skalierbarkeit, lange Nominalphrasen und fehlende EntitĂ€ten in einer Wissensbank. In dieser Hinsicht leistet diese Dissertation vier HauptbeitrĂ€ge, denen allen das Wissen von UMLS (Unified Medical Language System), der grĂ¶ĂŸten und wichtigsten Wissensbank im biomedizinischen Bereich, zu Grunde liegt. Der erste Beitrag ist eine schnelle Methode zur Erkennung von EntitĂ€ten mittels Lexikonabgleich, welche den Durchsatz maximiert und gleichzeitig den Verlust in Genauigkeit und Trefferquote (precision and recall) balanciert. Der zweite Beitrag ist eine Methode zur Klassifizierung der semantischen Typen von Nomen, die sich auf gebrĂ€uchliche Nomen von langen Nominalphrasen richtet und auf einer selbstentwickelten Sammlung von semantischen Typen beruht, die die Verwendung der Nomen erfasst. Neben biomedizinischen können diese Typen auch nicht-biomedizinische und allgemeine, informationsarme Verwendungen behandeln. Der dritte Beitrag ist eine schnelle Heuristikmethode zur Disambiguierung von EntitĂ€ten in MEDLINE Kurzfassungen, welche den Durchsatz maximiert, aber auch die Genauigkeit erhĂ€lt. Der vierte Beitrag ist eine korpusgetriebene Methode zur Disambiguierung von EntitĂ€ten, die speziell fehlende EntitĂ€ten in einer Wissensbank behandelt. Die Methode wandelt erst die EntitĂ€ten, die in einem Textkorpus ausgedrĂŒckt aber nicht notwendigerweise in einer Wissensbank sind, in latente Darstellungen um und fĂŒhrt anschließend die Disambiguierung durch

    Actas del XXIV Workshop de Investigadores en Ciencias de la ComputaciĂłn: WICC 2022

    Get PDF
    CompilaciĂłn de las ponencias presentadas en el XXIV Workshop de Investigadores en Ciencias de la ComputaciĂłn (WICC), llevado a cabo en Mendoza en abril de 2022.Red de Universidades con Carreras en InformĂĄtic
    corecore