409 research outputs found
AGILE: ARBITRARY GRID LOGISTIC REGRESSION USING INTEL SOFTWARE GUARD EXTENSIONS
Biomedical data are often collected and stored at different sites. How to take the most advantage of the data to provide better health care for patients and to contribute to academic research becomes more and more important and challenging considering the privacy regulations association with the data. There are several barriers to sharing and exchanging information, such as complex of data formats, information leakage during the data transmission, and big data issues. In this thesis, I focus on how to conduct integrated data analysis while ensuring data privacy and security during both data transmission and integration. Through a small experiment of GLORE[1] implemented on both garbled circuits[2] and IntelĀ® Software Guard Extensions (IntelĀ® SGX), I found that IntelĀ® SGX performed better than garbled circuits in time consuming. So I believe that IntelĀ® SGX has the potential to make great progress in security multiparty computation. By applying IntelĀ® SGX, I not only built a framework but also devised a more flexible model that lets participants more freely cooperate with each other. My model AGILE leverages IntelĀ® SGX to deliver trustworthy computations, a feature that is unlike the existing models like GLORE and VERTIGO[3] that address the integration problem when data is either horizontally or vertically partitioned. AGILE deals with data that is arbitrarily partitioned. Furthermore, to demonstrate AGILEās performance, I evaluated the model using two real datasets. The experimental results show that AGILE provides secure and accurate computation much faster than GLORE and VERTIGO
ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks
Cross-institutional healthcare predictive modeling can accelerate research
and facilitate quality improvement initiatives, and thus is important for
national healthcare delivery priorities. For example, a model that predicts
risk of re-admission for a particular set of patients will be more
generalizable if developed with data from multiple institutions. While
privacy-protecting methods to build predictive models exist, most are based on
a centralized architecture, which presents security and robustness
vulnerabilities such as single-point-of-failure (and single-point-of-breach)
and accidental or malicious modification of records. In this article, we
describe a new framework, ModelChain, to adapt Blockchain technology for
privacy-preserving machine learning. Each participating site contributes to
model parameter estimation without revealing any patient health information
(i.e., only model data, no observation-level data, are exchanged across
institutions). We integrate privacy-preserving online machine learning with a
private Blockchain network, apply transaction metadata to disseminate partial
models, and design a new proof-of-information algorithm to determine the order
of the online learning process. We also discuss the benefits and potential
issues of applying Blockchain technology to solve the privacy-preserving
healthcare predictive modeling task and to increase interoperability between
institutions, to support the Nationwide Interoperability Roadmap and national
healthcare delivery priorities such as Patient-Centered Outcomes Research
(PCOR)
Recommended from our members
Preservation of Patient Level Privacy: Federated Classification and Calibration Models
With the launching of the Precision Medicine Initiative in the United States, by the National Institute of Health, and the emergence of a large volume of electronic health records, there are many opportunities to improve clinical decision support systems. A large number of samples are needed to build predictive models that have adequate discrimination and calibration. However, protecting patient privacy is also an important issue. Patient data are typically protected in localized silos, and consolidation of datasets from different healthcare systems is difficult. Federated learning allows the training of a global model by amassing intermediate calculations from localized medical systems. The knowledge learned from the data can be transferred and aggregated to achieve better performance than the one achieved by individual local models. Federated learning may help build better models, providing more accurate predictions. There are two types of measures to assess how well a model performs: discrimination and calibration. While most papers report discrimination measures, calibration has often been neglected but it is a critical metric for evaluation. In this dissertation, I show a novel way to build classifiers and calibration models in a federated manner. I also show how I can evaluate and improve model calibration in this manner. Federated modeling enables the accumulation of knowledge and information that are otherwise locked behind local medical systems
Psychometric evaluation of the Simulator Sickness Questionnaire as a measure of cybersickness
Some users of virtual reality (VR) technology experience negative symptoms, known as cybersickness, sometimes severe enough to cause discontinuation of VR use. Despite decades of research, there has been relatively little progress understanding the underlying causal mechanisms of cybersickness. Review of the measures used to assess cybersickness symptoms, particularly the subjective psychological components of cybersickness, indicated that extant questionnaires may exhibit psychometric problems that could affect interpretation of results. In the present study, new data were collected (N = 202) to evaluate the psychometric properties of the Simulator Sickness Questionnaire (SSQ), the most commonly reported measure of cybersickness symptoms, in the context of virtual reality. Findings suggest that the SSQ, as commonly used, is not applicable to VR. An alternative approach to measure cybersickness is suggested. Overall, incidence and severity of cybersickness was very low and participants rated the VR experience as highly entertaining
Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study
Background: Machine learning (ML) is now widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. Federated learning (FL) is an approach to overcome this issue. We focused on applying FL on vertically partitioned data, in which an individual's record is scattered among different sites.
Objective: The aim of this study was to perform FL on vertically partitioned data to achieve performance comparable to that of centralized models without exposing the raw data.
Methods: We used three different datasets (Adult income, Schwannoma, and eICU datasets) and vertically divided each dataset into different pieces. Following the vertical division of data, overcomplete autoencoder-based model training was performed for each site. Following training, each site's data were transformed into latent data, which were aggregated for training. A tabular neural network model with categorical embedding was used for training. A centrally based model was used as a baseline model, which was compared to that of FL in terms of accuracy and area under the receiver operating characteristic curve (AUROC).
Results: The autoencoder-based network successfully transformed the original data into latent representations with no domain knowledge applied. These altered data were different from the original data in terms of the feature space and data distributions, indicating appropriate data security. The loss of performance was minimal when using an overcomplete autoencoder; accuracy loss was 1.2%, 8.89%, and 1.23%, and AUROC loss was 1.1%, 0%, and 1.12% in the Adult income, Schwannoma, and eICU dataset, respectively.
Conclusions: We proposed an autoencoder-based ML model for vertically incomplete data. Since our model is based on unsupervised learning, no domain-specific knowledge is required in individual sites. Under the circumstances where direct data sharing is not available, our approach may be a practical solution enabling both data protection and building a robust model.ope
- ā¦