10 research outputs found

    Predicting Unethical Physician Behavior At Scale: A Distributed Computing Framework

    Get PDF
    As the amount of publicly shared data increases, developing a robust pipeline to stream, store and process data is critical, as the casual user often lacks the technology, hardware and/or skills needed to work with such voluminous data. In this research, the authors employ Amazon EC2 and EMR, MongoDB, and Spark MLlib to explore 28.5 gigabytes of CMS Open Payments data in an attempt to identify physicians who may have a high propensity to act unethically, owing to significant transfers of wealth from medical companies. A Random Forest Classifier is employed to predict the top decile of physicians who have the highest risk of unethical behavior in the following year, resulting in an F-Score of 91%. The data is also analyzed by an anomaly detection algorithm that correctly identified a highprofile case of a physician leaving his prestigious position, as he failed to disclose anomalously-large transfers of wealth from medical companies

    Quantum Criticism

    Get PDF
    Quantum Criticism scrapes data from the News Articles and performs Sentiment Analysis

    Forecasting Smart Meter Energy Usage Using Distributed Systems and Machine Learning

    No full text
    In this research, we explore the technical and computational merits of a machine learning algorithm on a large data set, employing distributed systems. Using 167 million(10 GB) energy consumption observations collected by smart meters from residential consumers in London, England, we predict future residential energy consumption using a Random Forest machine learning algorithm. Distributed systems such as AWS S3 and EMR, MongoDB and Apache Spark are used. Computational times and predictive accuracy are evaluated. We conclude that there are significant computational advantages to using distributed systems when applying machine learning algorithms on large-scale data. We also observe that distributed systems can be computationally burdensome when the amount of data being processed is below a threshold at which it can leverage the computational efficiencies provided by distributed systems

    Effects of clinical characteristics on successful open access scheduling

    No full text
    Many outpatient clinics are experimenting with open access scheduling. Under open access, patients see their physicians within a day or two of making their appointment request, and long term patient booking is very limited. The hope is that these short appointment lead times will improve patient access and reduce uncertainty in clinic operations by reducing patient no-shows. Practice shows that successful implementation can be strongly influenced by clinic characteristics, indicating that open access policies must be designed to account for local clinical conditions. The effects of four variables on clinic performance are examined: (1) the fraction of patients being served on open access, (2) the scheduling horizon for patients on longer term appointment scheduling, (3) provider care groups, and (4) overbooking. Discrete event simulation, designed experimentation, and data drawn from an intercity clinic in central Indiana are used to study the effects of these variables on clinic throughput and patient continuity of care. Results show that, if correctly configured, open access can lead to significant improvements in clinic throughput with little sacrifice in continuity of care. Copyright Springer Science+Business Media, LLC 2007Open access, Appointment scheduling, Patient no-show, Outpatient clinic, Simulation,

    Imperatives for health sector decision-support modelling

    No full text
    corecore