12 research outputs found

    Sélection Robuste de Mesures de Similarité Sémantique à partir de Données Incertaines d'Expertise

    Get PDF
    National audienceKnowledge-based semantic measures are cornerstone to exploit ontologies not only for exact inferences or retrieval processes, but also for data analyses and inexact searches. Abstract theoretical frameworks have recently been proposed in order to study the large diversity of measures available; they demonstrate that groups of measures are particular instantiations of general parameterized functions. In this paper, we study how such frameworks can be used to support the selection/design of measures. Based on (i) a theoretical framework unifying the measures, (ii) a software solution implementing this framework and (iii) a domain-specific benchmark, we define a semi-supervised learning technique to distinguish best measures for a concrete application. Next, considering uncertainty in both experts’ judgments and measures’ selection process, we extend this proposal for robust selection of semantic measures that best resists to these uncertainties. We illustrate our approach through a real use case in the biomedical domain..L'exploitation d'ontologies pour la recherche d'information, la découverte de connaissances ou le raisonnement approché nécessite l'utilisation de mesures sémantiques qui permettent d'estimer le degré de similarité entre des entités lexicales ou conceptuelles. Récemment un cadre théorique abstrait a été proposé afin d'unifier la grande diversité de ces mesures, au travers de fonctions paramétriques générales. Cet article propose une utilisation de ce cadre unificateur pour choisir une mesure. A partir du (i) cadre unificateur exprimant les mesures basées sur un ensemble limité de primitives, (ii) logiciel implémentant ce cadre et (iii) benchmark d'un domaine spécifique, nous utilisons une technique d'apprentissage semi-supervisé afin de fournir la meilleure mesure sémantique pour une application donnée. Ensuite, sachant que les données fournies par les experts sont entachées d'incertitude, nous étendons notre approche pour choisir la plus robuste parmi les meilleures mesures, i.e. la moins perturbée par les erreurs d'évaluation experte. Nous illustrons notre approche par une application dans le domaine biomédical. Mots-clés: Cadre unificateur, robustesse de mesures, incertitude d'expert, mesures de similarité sémantique, ontologies

    BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

    Full text link
    In this study, we investigate the potential of Large Language Models to complement biomedical knowledge graphs in the training of semantic models for the biomedical and clinical domains. Drawing on the wealth of the UMLS knowledge graph and harnessing cutting-edge Large Language Models, we propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences, consisting of three steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase. Through rigorous evaluations via the extensive BioLORD testing suite and diverse downstream tasks, we demonstrate consistent and substantial performance improvements over the previous state of the art (e.g. +2pts on MedSTS, +2.5pts on MedNLI-S, +6.1pts on EHR-Rel-B). Besides our new state-of-the-art biomedical model for English, we also distill and release a multilingual model compatible with 50+ languages and finetuned on 7 European languages. Many clinical pipelines can benefit from our latest models. Our new multilingual model enables a range of languages to benefit from our advancements in biomedical semantic representation learning, opening a new avenue for bioinformatics researchers around the world. As a result, we hope to see BioLORD-2023 becoming a precious tool for future biomedical applications.Comment: Preprint of upcoming journal articl

    Inter-Coder Agreement for Computational Linguistics

    Get PDF
    This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder. </jats:p

    Assessing Clinical Software User Needs for Improved Clinical Decision Support Tools

    Get PDF
    Consolidating patient and clinical data to support better-informed clinical decisions remains a primary function of electronic health records (EHRs). In the United States, nearly 6 million patients receive care from an accountable care organization (ACO). Knowledge of clinical decision support (CDS) tool design for use by physicians participating in ACOs remains limited. The purpose of this quantitative study was to examine whether a significant correlation exists between characteristics of alert content and alert timing (the independent variables) and physician perceptions of improved ACO quality measure adherence during electronic ordering (the dependent variable). Sociotechnical theory supported the theoretical framework for this research. Sixty-nine physician executives using either a Cerner Incorporated or Epic Systems EHR in a hospital or health system affiliated ACO participated in the online survey. The results of the regression analysis were statistically significant, R2 = .108, F(2,66) = 3.99, p = .023, indicating that characteristics of alert content and timing affect physician perceptions for improving their adherence to ACO quality measures. However, analysis of each independent variable showed alert content highly correlated with the dependent variable (p = .007) with no significant correlation found between workflow timing and the dependent variable (p = .724). Understanding the factors that support physician acceptance of alerts is essential to third-party software developers and health care organizations designing CDS tools. Providing physicians with improved EHR-integrated CDS tools supports the population health goal of ACOs in delivering better patient care
    corecore