409 research outputs found

    AGILE: ARBITRARY GRID LOGISTIC REGRESSION USING INTEL SOFTWARE GUARD EXTENSIONS

    Get PDF
    Biomedical data are often collected and stored at different sites. How to take the most advantage of the data to provide better health care for patients and to contribute to academic research becomes more and more important and challenging considering the privacy regulations association with the data. There are several barriers to sharing and exchanging information, such as complex of data formats, information leakage during the data transmission, and big data issues. In this thesis, I focus on how to conduct integrated data analysis while ensuring data privacy and security during both data transmission and integration. Through a small experiment of GLORE[1] implemented on both garbled circuits[2] and IntelĀ® Software Guard Extensions (IntelĀ® SGX), I found that IntelĀ® SGX performed better than garbled circuits in time consuming. So I believe that IntelĀ® SGX has the potential to make great progress in security multiparty computation. By applying IntelĀ® SGX, I not only built a framework but also devised a more flexible model that lets participants more freely cooperate with each other. My model AGILE leverages IntelĀ® SGX to deliver trustworthy computations, a feature that is unlike the existing models like GLORE and VERTIGO[3] that address the integration problem when data is either horizontally or vertically partitioned. AGILE deals with data that is arbitrarily partitioned. Furthermore, to demonstrate AGILEā€™s performance, I evaluated the model using two real datasets. The experimental results show that AGILE provides secure and accurate computation much faster than GLORE and VERTIGO

    ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks

    Full text link
    Cross-institutional healthcare predictive modeling can accelerate research and facilitate quality improvement initiatives, and thus is important for national healthcare delivery priorities. For example, a model that predicts risk of re-admission for a particular set of patients will be more generalizable if developed with data from multiple institutions. While privacy-protecting methods to build predictive models exist, most are based on a centralized architecture, which presents security and robustness vulnerabilities such as single-point-of-failure (and single-point-of-breach) and accidental or malicious modification of records. In this article, we describe a new framework, ModelChain, to adapt Blockchain technology for privacy-preserving machine learning. Each participating site contributes to model parameter estimation without revealing any patient health information (i.e., only model data, no observation-level data, are exchanged across institutions). We integrate privacy-preserving online machine learning with a private Blockchain network, apply transaction metadata to disseminate partial models, and design a new proof-of-information algorithm to determine the order of the online learning process. We also discuss the benefits and potential issues of applying Blockchain technology to solve the privacy-preserving healthcare predictive modeling task and to increase interoperability between institutions, to support the Nationwide Interoperability Roadmap and national healthcare delivery priorities such as Patient-Centered Outcomes Research (PCOR)

    Psychometric evaluation of the Simulator Sickness Questionnaire as a measure of cybersickness

    Get PDF
    Some users of virtual reality (VR) technology experience negative symptoms, known as cybersickness, sometimes severe enough to cause discontinuation of VR use. Despite decades of research, there has been relatively little progress understanding the underlying causal mechanisms of cybersickness. Review of the measures used to assess cybersickness symptoms, particularly the subjective psychological components of cybersickness, indicated that extant questionnaires may exhibit psychometric problems that could affect interpretation of results. In the present study, new data were collected (N = 202) to evaluate the psychometric properties of the Simulator Sickness Questionnaire (SSQ), the most commonly reported measure of cybersickness symptoms, in the context of virtual reality. Findings suggest that the SSQ, as commonly used, is not applicable to VR. An alternative approach to measure cybersickness is suggested. Overall, incidence and severity of cybersickness was very low and participants rated the VR experience as highly entertaining

    Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study

    Get PDF
    Background: Machine learning (ML) is now widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. Federated learning (FL) is an approach to overcome this issue. We focused on applying FL on vertically partitioned data, in which an individual's record is scattered among different sites. Objective: The aim of this study was to perform FL on vertically partitioned data to achieve performance comparable to that of centralized models without exposing the raw data. Methods: We used three different datasets (Adult income, Schwannoma, and eICU datasets) and vertically divided each dataset into different pieces. Following the vertical division of data, overcomplete autoencoder-based model training was performed for each site. Following training, each site's data were transformed into latent data, which were aggregated for training. A tabular neural network model with categorical embedding was used for training. A centrally based model was used as a baseline model, which was compared to that of FL in terms of accuracy and area under the receiver operating characteristic curve (AUROC). Results: The autoencoder-based network successfully transformed the original data into latent representations with no domain knowledge applied. These altered data were different from the original data in terms of the feature space and data distributions, indicating appropriate data security. The loss of performance was minimal when using an overcomplete autoencoder; accuracy loss was 1.2%, 8.89%, and 1.23%, and AUROC loss was 1.1%, 0%, and 1.12% in the Adult income, Schwannoma, and eICU dataset, respectively. Conclusions: We proposed an autoencoder-based ML model for vertically incomplete data. Since our model is based on unsupervised learning, no domain-specific knowledge is required in individual sites. Under the circumstances where direct data sharing is not available, our approach may be a practical solution enabling both data protection and building a robust model.ope
    • ā€¦
    corecore