20 research outputs found

    Design and implementation of serverless architecture for i2b2 on AWS cloud and Snowflake data warehouse

    Get PDF
    Informatics for Integrating Biology and the Beside (i2b2) is an open-source medical tool for cohort discovery that allows researchers to explore and query clinical data. The i2b2 platform is designed to adopt any patient-centric data models and used at over 400 healthcare institutions worldwide for querying patient data. The platform consists of a webclient, core servers and database. Despite having installation guidelines, the complex architecture of the system with numerous dependencies and configuration parameters makes it difficult to install a functional i2b2 platform. On the other hand, maintaining the scalability, security, availability of the application is also challenging and requires lot of resources. Our aim was to deploy the i2b2 for University of Missouri (UM) System in the cloud as well as reduce the complexity and effort of the installation and maintenance process. Our solution encapsulated the complete installation process of each component using docker and deployed the container in the AWS Virtual Private Cloud (VPC) using several AWS PaaS (Platform as a Service), IaaS (Infrastructure as a Service) services. We deployed the application as a service in the AWS FARGATE, an on-demand, serverless, auto scalable compute engine. We also enhanced the functionality of i2b2 services and developed Snowflake JDBC driver support for i2b2 backend services. It enabled i2b2 services to query directly from Snowflake analytical database. In addition, we also created i2b2-data-installer package to load PCORnet CDM and ACT ontology data into i2b2 database. The i2b2 platform in University of Missouri holds 1.26B facts of 2.2M patients of UM Cerner Millennium data.Includes bibliographical references

    Collaborative Cloud Computing Framework for Health Data with Open Source Technologies

    Full text link
    The proliferation of sensor technologies and advancements in data collection methods have enabled the accumulation of very large amounts of data. Increasingly, these datasets are considered for scientific research. However, the design of the system architecture to achieve high performance in terms of parallelization, query processing time, aggregation of heterogeneous data types (e.g., time series, images, structured data, among others), and difficulty in reproducing scientific research remain a major challenge. This is specifically true for health sciences research, where the systems must be i) easy to use with the flexibility to manipulate data at the most granular level, ii) agnostic of programming language kernel, iii) scalable, and iv) compliant with the HIPAA privacy law. In this paper, we review the existing literature for such big data systems for scientific research in health sciences and identify the gaps of the current system landscape. We propose a novel architecture for software-hardware-data ecosystem using open source technologies such as Apache Hadoop, Kubernetes and JupyterHub in a distributed environment. We also evaluate the system using a large clinical data set of 69M patients.Comment: This paper is accepted in ACM-BCB 202

    Analysis of financial and technical feasibilty of a clinicians generated data platform of fybromyalgia syndrome patients

    Get PDF
    This master thesis analyzes the technical and economical feasibility for a medical database, based on clinically generated data of patients with the fibromyalgia syndrome. The main idea is to collect patient data on a regular basis during standard visiting hours at their doctor. Therefore it is essential to provide a data collection platform that can be simply used by the patient and doctor. The collected information (no personal data) shall be shared between researchers to enhance collaborative studies, make studies with rare diseases possible as well as to reduce the cost and effort to gather a big enough cohort group for the study. There are already several medical databases in place that collect and share patient information for research. Yet, despite the significant socioeconomic impact of fibromyalgia, no large database about this disease exists. An introduction to the fibromyalgia syndrome and its impact on society are given. Furthermore medical database technologies and medical database projects for other diseases are described. The presented technologies are further analyzed for their usefulness of creating a database to collect information about fibromyalgia syndrome patients and to use it to enhance its research. Additionally the legal requirements for maintaining such a platform as well as the potential cost are examined. Two possible business models to provide such a platform with funding are presented. Last but not least a possible use case for the collection of patient data via a survey created with REDCap and the integration process into i2b2 has been created and possible suggestions for improvements in the future have been made to bring the platform to a release ready state

    MedCo: Enabling Secure and Privacy-Preserving Exploration of Distributed Clinical and Genomic Data

    Get PDF
    The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo’s practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee

    MedCo: Enabling Privacy-Conscious Exploration of Distributed Clinical and Genomic Data

    Get PDF
    Being able to share large amounts of sensitive clinical and genomic data across several institutions is crucial for precision medicine to scale up. Unfor- tunately, existing solutions only partially address this challenge and are still unable to provide the strong privacy and security guarantees required by regulations (e.g., HIPAA, GDPR). As a result, currently only very limited datasets of non-sensitive and moderately useful information can be shared. In this paper, we introduce MedCo, the first operational system that enables an investigator to explore sensi- tive medical information distributed at several sites and protected with collective homomorphic encryption. MedCo builds on top of established and widespread technology from the biomedical informatics community, such as i2b2 and SHRINE, and relies on state-of-the-art secure protocols for processing encrypted distributed data and complying with regulations. As such, MedCo can be easily adopted by clinical sites thus paving the way to new unexplored data-sharing use cases. We tested MedCo in a real network of three institutions (EPFL, UNIL and CHUV) by focusing on an oncology use-case with real somatic mutations and clinical tumor data. The relatively low overhead introduced by MedCo shows that it represents a concrete and scalable solution for sharing privacy-conscious medical data

    Towards AI-assisted Healthcare: System Design and Deployment for Machine Learning based Clinical Decision Support

    Get PDF
    Over the last decade, American hospitals have adopted electronic health records (EHRs) widely. In the next decade, incorporating EHRs with clinical decision support (CDS) together into the process of medicine has the potential to change the way medicine has been practiced and advance the quality of patient care. It is a unique opportunity for machine learning (ML), with its ability to process massive datasets beyond the scope of human capability, to provide new clinical insights that aid physicians in planning and delivering care, ultimately leading to better outcomes, lower costs of care, and increased patient satisfaction. However, applying ML-based CDS has to face steep system and application challenges. No open platform is there to support ML and domain experts to develop, deploy, and monitor ML-based CDS; and no end-to-end solution is available for machine learning algorithms to consume heterogenous EHRs and deliver CDS in real-time. Build ML-based CDS from scratch can be expensive and time-consuming. In this dissertation, CDS-Stack, an open cloud-based platform, is introduced to help ML practitioners to deploy ML-based CDS into healthcare practice. The CDS-Stack integrates various components into the infrastructure for the development, deployment, and monitoring of the ML-based CDS. It provides an ETL engine to transform heterogenous EHRs, either historical or online, into a common data model (CDM) in parallel so that ML algorithms can directly consume health data for training or prediction. It introduces both pull and push-based online CDS pipelines to deliver CDS in real-time. The CDS-Stack has been adopted by Johns Hopkins Medical Institute (JHMI) to deliver a sepsis early warning score since November 2017 and begins to show promising results. Furthermore, we believe CDS-Stack can be extended to outpatients too. A case study of outpatient CDS has been conducted which utilizes smartphones and machine learning to quantify the severity of Parkinson disease. In this study, a mobile Parkinson disease severity score (mPDS) is generated using a novel machine learning approach. The results show it can detect response to dopaminergic therapy, correlate strongly with traditional rating scales, and capture intraday symptom fluctuation

    Privacy-Enhancing Technologies for Medical and Genomic Data: From Theory to Practice

    Get PDF
    The impressive technological advances in genomic analysis and the significant drop in the cost of genome sequencing are paving the way to a variety of revolutionary applications in modern healthcare. In particular, the increasing understanding of the human genome, and of its relation to diseases, health and to responses to treatments brings promise of improvements in better preventive and personalized medicine. Unfortunately, the impact on privacy and security is unprecedented. The genome is our ultimate identifier and, if leaked, it can unveil sensitive and personal information such as our genetic diseases, our propensity to develop certain conditions (e.g., cancer or Alzheimer's) or the health issues of our family. Even though legislation, such as the EU General Data Protection Regulation (GDPR) or the US Health Insurance Portability and Accountability Act (HIPAA), aims at mitigating abuses based on genomic and medical data, it is clear that this information also needs to be protected by technical means. In this thesis, we investigate the problem of developing new and practical privacy-enhancing technologies (PETs) for the protection of medical and genomic data. Our goal is to accelerate the adoption of PETs in the medical field in order to address the privacy and security concerns that prevent personalized medicine from reaching its full potential. We focus on two main areas of personalized medicine: clinical care and medical research. For clinical care, we first propose a system for securely storing and selectively retrieving raw genomic data that is indispensable for in-depth diagnoses and treatments of complex genetic diseases such as cancer. Then, we focus on genetic variants and devise a new model based on additively-homomorphic encryption for privacy-preserving genetic testing in clinics. Our model, implemented in the context of HIV treatment, is the first to be tested and evaluated by practitioners in a real operational setting. For medical research, we first propose a method that combines somewhat-homomorphic encryption with differential privacy to enable secure feasibility studies on genetic data stored at an untrusted central repository. Second, we address the problem of sharing genomic and medical data when the data is distributed across multiple mistrustful institutions. We begin by analyzing the risks that threaten patientsâ privacy in systems for the discovery of genetic variants, and we propose practical mitigations to the re-identification risk. Then, for clinical sites to be able to share the data without worrying about the risk of data breaches, we develop a new system based on collective homomorphic encryption: it achieves trust decentralization and enables researchers to securely find eligible patients for clinical studies. Finally, we design a new framework, complementary to the previous ones, for quantifying the risk of unintended disclosure caused by potential inference attacks that are jointly combined by a malicious adversary, when exact genomic data is shared. In summary, in this thesis we demonstrate that PETs, still believed unpractical and immature, can be made practical and can become real enablers for overcoming the privacy and security concerns blocking the advancement of personalized medicine. Addressing privacy issues in healthcare remains a great challenge that will increasingly require long-term collaboration among geneticists, healthcare providers, ethicists, lawmakers, and computer scientists

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe
    corecore