426 research outputs found
Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent
Combining data from varied sources has considerable potential for knowledge
discovery: collaborating data parties can mine data in an expanded feature
space, allowing them to explore a larger range of scientific questions.
However, data sharing among different parties is highly restricted by legal
conditions, ethical concerns, and / or data volume. Fueled by these concerns,
the fields of cryptography and distributed learning have made great progress
towards privacy-preserving and distributed data mining. However, practical
implementations have been hampered by the limited scope or computational
complexity of these methods. In this paper, we greatly extend the range of
analyses available for vertically partitioned data, i.e., data collected by
separate parties with different features on the same subjects. To this end, we
present a novel approach for privacy-preserving generalized linear models, a
fundamental and powerful framework underlying many prediction and
classification procedures. We base our method on a distributed block coordinate
descent algorithm to obtain parameter estimates, and we develop an extension to
compute accurate standard errors without additional communication cost. We
critically evaluate the information transfer for semi-honest collaborators and
show that our protocol is secure against data reconstruction. Through both
simulated and real-world examples we illustrate the functionality of our
proposed algorithm. Without leaking information, our method performs as well on
vertically partitioned data as existing methods on combined data -- all within
mere minutes of computation time. We conclude that our method is a viable
approach for vertically partitioned data analysis with a wide range of
real-world applications.Comment: Fully reproducible code for all results and images can be found at
https://github.com/vankesteren/privacy-preserving-glm, and the software
package can be found at https://github.com/vankesteren/privre
Strategies for including cloud-computing into an engineering modeling workflow
With the advent of cloud computing, high-end computing, networking, and storage resources are available on-demand at a relatively low price point. Internet applications in the consumer and increasingly in the enterprise space are making use of these resources to upgrade existing applications and build new ones. This is made possible by building decentralized applications that can be integrated with one another through web-enabled application programming interfaces (APIs). However, in the fields of engineering and computational science, cloud computing resources have been utilized primarily to augment existing high-performance computing hardware, but engineering model integrations still occur by the use of software libraries. In this research, a novel approach is proposed where engineering models are constructed as independent services that publish web-enabled APIs. To enable this, the engineering models are built as stateless microservices that solve a single computational problem. Composite services are then built utilizing these independent component models, much like in the consumer application space. Interactions between component models is orchestrated by a federation management system. This proposed approach is then demonstrated by disaggregating an existing monolithic model for a cookstove into a set of component models. The component models are then reintegrated and compared with the original model for computational accuracy and run-time. Additionally, a novel engineering workflow is proposed that reuses computational data by constructing reduced-order models (ROMs). This framework is evaluated empirically for a number of producers and consumers of engineering models based on computation and data synchronization aspects. The framework is also evaluated by simulating an engineering design workflow with multiple producers and consumers at various stages during the design process.
Finally, concepts from the federated system of models and ROMs are combined to propose the concept of a hybrid model (information artefact). The hybrid model is a web-enabled microservice that encapsulates information from multiple engineering models at varying fidelities, and responds to queries based on the best available information. Rules for the construction of hybrid models have been proposed and evaluated in the context of engineering workflows
Protecting privacy of users in brain-computer interface applications
Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG signals, including passwords, ATM PINs, and other intimate data. The challenge we address is how to engage in meaningful ML with EEG data while protecting the privacy of users. Hence, we propose cryptographic protocols based on secure multiparty computation (SMC) to perform linear regression over EEG signals from many users in a fully privacy-preserving(PP) fashion, i.e., such that each individual's EEG signals are not revealed to anyone else. To illustrate the potential of our secure framework, we show how it allows estimating the drowsiness of drivers from their EEG signals as would be possible in the unencrypted case, and at a very reasonable computational cost. Our solution is the first application of commodity-based SMC to EEG data, as well as the largest documented experiment of secret sharing-based SMC in general, namely, with 15 players involved in all the computations
Healthcare Data Analytics and Privacy Preservation by DCNN Algorithm
Data has become an integral part of the digital world with the advancement in computing technologies. The collection of data is very crucial with regards to data analytics. Every industry makes use of data analytics ranging from financial to other commercial applications but it becomes even more important in healthcare domain for the analysis of healthcare data. The present research work is mainly focused on classification/prediction problems of healthcare data based on deep learning (supervised) approaches using data mining techniques. There is a need to design an intelligent model (based on deep learning) which can classify the amount of data that is stored in our databases. Human data analytical capability rate is much smaller when compared to the amount of data that is stored. This (classification) becomes even more critical when it comes to healthcare data as it can help to detect, diagnose and treat the patients based on these classified data. The main goal of the thesis is to develop a deep learning-based model for classification tasks and the introduced DDS can be used in healthcare domain to improve the diagnostic speed, accuracy and reliability
Convolutional Monge Mapping Normalization for learning on sleep data
In many machine learning applications on signals and biomedical data,
especially electroencephalogram (EEG), one major challenge is the variability
of the data across subjects, sessions, and hardware devices. In this work, we
propose a new method called Convolutional Monge Mapping Normalization (CMMN),
which consists in filtering the signals in order to adapt their power spectrum
density (PSD) to a Wasserstein barycenter estimated on training data. CMMN
relies on novel closed-form solutions for optimal transport mappings and
barycenters and provides individual test time adaptation to new data without
needing to retrain a prediction model. Numerical experiments on sleep EEG data
show that CMMN leads to significant and consistent performance gains
independent from the neural network architecture when adapting between
subjects, sessions, and even datasets collected with different hardware.
Notably our performance gain is on par with much more numerically intensive
Domain Adaptation (DA) methods and can be used in conjunction with those for
even better performances
- …