16 research outputs found

    Scalable and accurate deep learning for electronic health records

    Get PDF
    Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient's chart.Comment: Published version from https://www.nature.com/articles/s41746-018-0029-

    Privacy-Aware Personalization for Mobile Advertising

    No full text
    Mobile advertising is an increasingly important driver in the Internet economy. We point out fundamental trade-offs between important variables in the mobile advertisement ecosystem. In order to increase relevance, ad campaigns tend to become more targeted and personalized by using context information extracted from user’s interactions and smartphone’s sensors. This raises privacy concerns that are hard to overcome due to the limited resources (energy and bandwidth) available on the phones. We point out that in the absence of a trusted third party, it is impossible to maximize these three variables—ad relevance, privacy, and efficiency—in a single system. This leads to the natural question: can we formalize a common framework for personalized ad delivery that can be instantiated to any desired trade-off point? We propose such a flexible ad-delivery framework where personalization is done jointly by the server and the phone. We show that the underlying optimization problem is NP-hard and present an efficient algorithm with a tight approximation guarantee. Since tuning personalization rules requires implicit user feedback, such as clicks, we ask how can we, in an efficient and privacypreserving way, gather statistics over a dynamic population of mobile users? This is needed for end-to-end privacy of an ad system. We propose the first differentially-private distributed protocol that works even in the presence of a dynamic and malicious set of users. We evaluate our methods with a large click log of location-aware searches in Microsoft Bing for mobile. Our experiments show that our framework can simultaneously achieve reasonable levels of privacy, efficiency, and ad relevance and can efficiently support a high churn rate of users during the gathering statistics that are required for personalization. 1

    Toward Falsifying Causal Graphs Using a Permutation-Based Test

    Full text link
    Understanding the causal relationships among the variables of a system is paramount to explain and control its behaviour. Inferring the causal graph from observational data without interventions, however, requires a lot of strong assumptions that are not always realistic. Even for domain experts it can be challenging to express the causal graph. Therefore, metrics that quantitatively assess the goodness of a causal graph provide helpful checks before using it in downstream tasks. Existing metrics provide an absolute number of inconsistencies between the graph and the observed data, and without a baseline, practitioners are left to answer the hard question of how many such inconsistencies are acceptable or expected. Here, we propose a novel consistency metric by constructing a surrogate baseline through node permutations. By comparing the number of inconsistencies with those on the surrogate baseline, we derive an interpretable metric that captures whether the DAG fits significantly better than random. Evaluating on both simulated and real data sets from various domains, including biology and cloud monitoring, we demonstrate that the true DAG is not falsified by our metric, whereas the wrong graphs given by a hypothetical user are likely to be falsified.Comment: 23 pages, 9 figure

    Non-tracking Web Analytics

    No full text
    Today, websites commonly use third party web analytics services to obtain aggregate information about users that visit their sites. This information includes demographics and visits to other sites as well as user behavior within their own sites. Unfortunately, to obtain this aggregate information, web analytics services track individual user browsing behavior across the web. This violation of user privacy has been strongly criticized, resulting in tools that block such tracking as well as anti-tracking legislation and standards such as Do-Not-Track. These efforts, while improving user privacy, degrade the quality of web analytics. This paper presents the first design of a system that provides web analytics without tracking. The system gives users differential privacy guarantees, can provide better quality analytics than current services, requires no new organizational players, and is practical to deploy. This paper describes and analyzes the design, gives performance benchmarks, and presents our implementation and deployment across several hundred users

    Institutional strategies related to test-taking behavior in low stakes assessment

    No full text
    Low stakes assessment without grading the performance of students in educational systems has received increasing attention in recent years. It is used in formative assessments to guide the learning process as well as in large-scales assessments to monitor educational programs. Yet, such assessments suffer from high variation in students' test-taking effort. We aimed to identify institutional strategies related to serious test-taking behavior in low stakes assessment to provide medical schools with practical recommendations on how test-taking effort might be increased. First, we identified strategies that were already used by medical schools to increase the serious test-taking behavior on the low stakes Berlin Progress Test (BPT). Strategies which could be assigned to self-determination theory of Ryan and Deci were chosen for analysis. We conducted the study at nine medical schools in Germany and Austria with a total of 108,140 observations in an established low stakes assessment. A generalized linear-mixed effects model was used to assess the association between institutional strategies and the odds that students will take the BPT seriously. Overall, two institutional strategies were found to be positively related to more serious test-taking behavior: discussing low test performance with the mentor and consequences for not participating. Giving choice was negatively related to more serious test-taking behavior. At medical schools that presented the BPT as evaluation, this effect was larger in comparison to medical schools that presented the BPT as assessment
    corecore