72,931 research outputs found

    Predicting diabetes-related hospitalizations based on electronic health records

    Full text link
    OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip

    Random projection to preserve patient privacy

    Get PDF
    With the availability of accessible and widely used cloud services, it is natural that large components of healthcare systems migrate to them; for example, patient databases can be stored and processed in the cloud. Such cloud services provide enhanced flexibility and additional gains, such as availability, ease of data share, and so on. This trend poses serious threats regarding the privacy of the patients and the trust that an individual must put into the healthcare system itself. Thus, there is a strong need of privacy preservation, achieved through a variety of different approaches. In this paper, we study the application of a random projection-based approach to patient data as a means to achieve two goals: (1) provably mask the identity of users under some adversarial-attack settings, (2) preserve enough information to allow for aggregate data analysis and application of machine-learning techniques. As far as we know, such approaches have not been applied and tested on medical data. We analyze the tradeoff between the loss of accuracy on the outcome of machine-learning algorithms and the resilience against an adversary. We show that random projections proved to be strong against known input/output attacks while offering high quality data, as long as the projected space is smaller than the original space, and as long as the amount of leaked data available to the adversary is limited

    Understanding Learned Models by Identifying Important Features at the Right Resolution

    Full text link
    In many application domains, it is important to characterize how complex learned models make their decisions across the distribution of instances. One way to do this is to identify the features and interactions among them that contribute to a model's predictive accuracy. We present a model-agnostic approach to this task that makes the following specific contributions. Our approach (i) tests feature groups, in addition to base features, and tries to determine the level of resolution at which important features can be determined, (ii) uses hypothesis testing to rigorously assess the effect of each feature on the model's loss, (iii) employs a hierarchical approach to control the false discovery rate when testing feature groups and individual base features for importance, and (iv) uses hypothesis testing to identify important interactions among features and feature groups. We evaluate our approach by analyzing random forest and LSTM neural network models learned in two challenging biomedical applications.Comment: First two authors contributed equally to this work, Accepted for presentation at the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19

    A Fuzzy Association Rule Mining Expert-Driven (FARME-D) approach to Knowledge Acquisition

    Get PDF
    Fuzzy Association Rule Mining Expert-Driven (FARME-D) approach to knowledge acquisition is proposed in this paper as a viable solution to the challenges of rule-based unwieldiness and sharp boundary problem in building a fuzzy rule-based expert system. The fuzzy models were based on domain experts’ opinion about the data description. The proposed approach is committed to modelling of a compact Fuzzy Rule-Based Expert Systems. It is also aimed at providing a platform for instant update of the knowledge-base in case new knowledge is discovered. The insight to the new approach strategies and underlining assumptions, the structure of FARME-D and its practical application in medical domain was discussed. Also, the modalities for the validation of the FARME-D approach were discussed

    SCAMP:standardised, concentrated, additional macronutrients, parenteral nutrition in very preterm infants: a phase IV randomised, controlled exploratory study of macronutrient intake, growth and other aspects of neonatal care

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Infants born <29 weeks gestation are at high risk of neurocognitive disability. Early postnatal growth failure, particularly head growth, is an important and potentially reversible risk factor for impaired neurodevelopmental outcome. Inadequate nutrition is a major factor in this postnatal growth failure, optimal protein and calorie (macronutrient) intakes are rarely achieved, especially in the first week. Infants <29 weeks are dependent on parenteral nutrition for the bulk of their nutrient needs for the first 2-3 weeks of life to allow gut adaptation to milk digestion. The prescription, formulation and administration of neonatal parenteral nutrition is critical to achieving optimal protein and calorie intake but has received little scientific evaluation. Current neonatal parenteral nutrition regimens often rely on individualised prescription to manage the labile, unpredictable biochemical and metabolic control characteristic of the early neonatal period. Individualised prescription frequently fails to translate into optimal macronutrient delivery. We have previously shown that a standardised, concentrated neonatal parenteral nutrition regimen can optimise macronutrient intake.</p> <p>Methods</p> <p>We propose a single centre, randomised controlled exploratory trial of two standardised, concentrated neonatal parenteral nutrition regimens comparing a standard macronutrient content (maximum protein 2.8 g/kg/day; lipid 2.8 g/kg/day, dextrose 10%) with a higher macronutrient content (maximum protein 3.8 g/kg/day; lipid 3.8 g/kg/day, dextrose 12%) over the first 28 days of life. 150 infants 24-28 completed weeks gestation and birthweight <1200 g will be recruited. The primary outcome will be head growth velocity in the first 28 days of life. Secondary outcomes will include a) auxological data between birth and 36 weeks corrected gestational age b) actual macronutrient intake in first 28 days c) biomarkers of biochemical and metabolic tolerance d) infection biomarkers and other intravascular line complications e) incidence of major complications of prematurity including mortality f) neurodevelopmental outcome at 2 years corrected gestational age</p> <p>Trial registration</p> <p>Current controlled trials: <a href="http://www.controlled-trials.com/ISRCTN76597892">ISRCTN76597892</a>; EudraCT Number: 2008-008899-14</p

    A spatial scan statistic for zero-inflated Poisson process

    Full text link
    The scan statistic is widely used in spatial cluster detection applications of inhomogeneous Poisson processes. However, real data may present substantial departure from the underlying Poisson process. One of the possible departures has to do with zero excess. Some studies point out that when applied to data with excess zeros, the spatial scan statistic may produce biased inferences. In this work, we develop a closed-form scan statistic for cluster detection of spatial zero-inflated count data. We apply our methodology to simulated and real data. Our simulations revealed that the Scan-Poisson statistic steadily deteriorates as the number of zeros increases, producing biased inferences. On the other hand, our proposed Scan-ZIP and Scan-ZIP+EM statistics are, most of the time, either superior or comparable to the Scan-Poisson statistic
    • …
    corecore