406 research outputs found

    An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping

    Full text link
    Diabetic eye disease is a major cause of blindness worldwide. The ability to monitor relevant clinical trajectories and detect lapses in care is critical to managing the disease and preventing blindness. Alas, much of the information necessary to support these goals is found only in the free text of the electronic medical record. To fill this information gap, we introduce a system for extracting evidence from clinical text of 19 clinical concepts related to diabetic eye disease and inferring relevant attributes for each. In developing this ophthalmology phenotyping system, we are also afforded a unique opportunity to evaluate the effectiveness of clinical language models at adapting to new clinical domains. Across multiple training paradigms, we find that BERT language models pretrained on out-of-distribution clinical data offer no significant improvement over BERT language models pretrained on non-clinical data for our domain. Our study tempers recent claims that language models pretrained on clinical data are necessary for clinical NLP tasks and highlights the importance of not treating clinical language data as a single homogeneous domain.Comment: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 24 page

    Improved human disease candidate gene prioritization using mouse phenotype

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.</p> <p>Results</p> <p>Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene <url>http://toppgene.cchmc.org</url>, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.</p> <p>Conclusion</p> <p>The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.</p

    Predicting diabetes-related hospitalizations based on electronic health records

    Full text link
    OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip

    Towards PACE-CAD Systems

    Get PDF
    Despite phenomenal advancements in the availability of medical image datasets and the development of modern classification algorithms, Computer-Aided Diagnosis (CAD) has had limited practical exposure in the real-world clinical workflow. This is primarily because of the inherently demanding and sensitive nature of medical diagnosis that can have far-reaching and serious repercussions in case of misdiagnosis. In this work, a paradigm called PACE (Pragmatic, Accurate, Confident, & Explainable) is presented as a set of some of must-have features for any CAD. Diagnosis of glaucoma using Retinal Fundus Images (RFIs) is taken as the primary use case for development of various methods that may enrich an ordinary CAD system with PACE. However, depending on specific requirements for different methods, other application areas in ophthalmology and dermatology have also been explored. Pragmatic CAD systems refer to a solution that can perform reliably in day-to-day clinical setup. In this research two, of possibly many, aspects of a pragmatic CAD are addressed. Firstly, observing that the existing medical image datasets are small and not representative of images taken in the real-world, a large RFI dataset for glaucoma detection is curated and published. Secondly, realising that a salient attribute of a reliable and pragmatic CAD is its ability to perform in a range of clinically relevant scenarios, classification of 622 unique cutaneous diseases in one of the largest publicly available datasets of skin lesions is successfully performed. Accuracy is one of the most essential metrics of any CAD system's performance. Domain knowledge relevant to three types of diseases, namely glaucoma, Diabetic Retinopathy (DR), and skin lesions, is industriously utilised in an attempt to improve the accuracy. For glaucoma, a two-stage framework for automatic Optic Disc (OD) localisation and glaucoma detection is developed, which marked new state-of-the-art for glaucoma detection and OD localisation. To identify DR, a model is proposed that combines coarse-grained classifiers with fine-grained classifiers and grades the disease in four stages with respect to severity. Lastly, different methods of modelling and incorporating metadata are also examined and their effect on a model's classification performance is studied. Confidence in diagnosing a disease is equally important as the diagnosis itself. One of the biggest reasons hampering the successful deployment of CAD in the real-world is that medical diagnosis cannot be readily decided based on an algorithm's output. Therefore, a hybrid CNN architecture is proposed with the convolutional feature extractor trained using point estimates and a dense classifier trained using Bayesian estimates. Evaluation on 13 publicly available datasets shows the superiority of this method in terms of classification accuracy and also provides an estimate of uncertainty for every prediction. Explainability of AI-driven algorithms has become a legal requirement after Europe’s General Data Protection Regulations came into effect. This research presents a framework for easy-to-understand textual explanations of skin lesion diagnosis. The framework is called ExAID (Explainable AI for Dermatology) and relies upon two fundamental modules. The first module uses any deep skin lesion classifier and performs detailed analysis on its latent space to map human-understandable disease-related concepts to the latent representation learnt by the deep model. The second module proposes Concept Localisation Maps, which extend Concept Activation Vectors by locating significant regions corresponding to a learned concept in the latent space of a trained image classifier. This thesis probes many viable solutions to equip a CAD system with PACE. However, it is noted that some of these methods require specific attributes in datasets and, therefore, not all methods may be applied on a single dataset. Regardless, this work anticipates that consolidating PACE into a CAD system can not only increase the confidence of medical practitioners in such tools but also serve as a stepping stone for the further development of AI-driven technologies in healthcare

    Integrating quantitative proteomics and metabolomics in a cellular model of diabetic retinopathy

    Get PDF
    Curs 2013-2014Diabetic retinopathy is the leading cause of visual loss in individuals under the age of 55. Most investigations into the pathogenesis of diabetic retinopathy have been concentrated on the neural retina since this is where clinical lesions are manifested. Recently, however, various abnormalities in the structural and secretory functions of retinal pigment epithelium that are essential for neuroretina survival, have been found in diabetic retinopathy. In this context, here we study the effect of hyperglycemic and hypoxic conditions on the metabolism of a human retinal pigment epithelial cell line (ARPE-19) by integrating quantitative proteomics using tandem mass tagging (TMT), untargeted metabolomics using MS and NMR, and 13C-glucose isotopic labeling for metabolic tracking. We observed a remarkable metabolic diversification under our simulated in vitro hyperglycemic conditions of diabetes, characterized increased flux through polyol pathways and inhibition of the Krebs cycle and oxidative phosphorylation. Importantly, under low oxygen supply RPE cells seem to consume rapidly glycogen storages and stimulate anaerobic glycolysis. Our results therefore pave the way to future scenarios involving new therapeutic strategies addressed to modulating RPE metabolic impairment, with the aim of regulating structural and secretory alterations of RPE. Finally, this study shows the importance of tackling biomedical problems by integrating metabolomic and proteomics results.Director/a: Oscar Yanes Torrado, Jordi Planas Cuch

    Improving the Quality and Utility of Electronic Health Record Data through Ontologies

    Get PDF
    The translational research community, in general, and the Clinical and Translational Science Awards (CTSA) community, in particular, share the vision of repurposing EHRs for research that will improve the quality of clinical practice. Many members of these communities are also aware that electronic health records (EHRs) suffer limitations of data becoming poorly structured, biased, and unusable out of original context. This creates obstacles to the continuity of care, utility, quality improvement, and translational research. Analogous limitations to sharing objective data in other areas of the natural sciences have been successfully overcome by developing and using common ontologies. This White Paper presents the authors’ rationale for the use of ontologies with computable semantics for the improvement of clinical data quality and EHR usability formulated for researchers with a stake in clinical and translational science and who are advocates for the use of information technology in medicine but at the same time are concerned by current major shortfalls. This White Paper outlines pitfalls, opportunities, and solutions and recommends increased investment in research and development of ontologies with computable semantics for a new generation of EHRs

    Applications of the Internet of Medical Things to Type 1 Diabetes Mellitus

    Get PDF
    Type 1 Diabetes Mellitus (DM1) is a condition of the metabolism typified by persistent hyperglycemia as a result of insufficient pancreatic insulin synthesis. This requires patients to be aware of their blood glucose level oscillations every day to deduce a pattern and anticipate future glycemia, and hence, decide the amount of insulin that must be exogenously injected to maintain glycemia within the target range. This approach often suffers from a relatively high imprecision, which can be dangerous. Nevertheless, current developments in Information and Communication Technologies (ICT) and innovative sensors for biological signals that might enable a continuous, complete assessment of the patient’s health provide a fresh viewpoint on treating DM1. With this, we observe that current biomonitoring devices and Continuous Glucose Monitoring (CGM) units can easily obtain data that allow us to know at all times the state of glycemia and other variables that influence its oscillations. A complete review has been made of the variables that influence glycemia in a T1DM patient and that can be measured by the above means. The communications systems necessary to transfer the information collected to a more powerful computational environment, which can adequately handle the amounts of data collected, have also been described. From this point, intelligent data analysis extracts knowledge from the data and allows predictions to be made in order to anticipate risk situations. With all of the above, it is necessary to build a holistic proposal that allows the complete and smart management of T1DM. This approach evaluates a potential shortage of such suggestions and the obstacles that future intelligent IoMT-DM1 management systems must surmount. Lastly, we provide an outline of a comprehensive IoMT-based proposal for DM1 management that aims to address the limits of prior studies while also using the disruptive technologies highlighted beforePartial funding for open access charge: Universidad de Málag
    • …
    corecore