426 research outputs found

    Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent

    Get PDF
    Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data sharing among different parties is highly restricted by legal conditions, ethical concerns, and / or data volume. Fueled by these concerns, the fields of cryptography and distributed learning have made great progress towards privacy-preserving and distributed data mining. However, practical implementations have been hampered by the limited scope or computational complexity of these methods. In this paper, we greatly extend the range of analyses available for vertically partitioned data, i.e., data collected by separate parties with different features on the same subjects. To this end, we present a novel approach for privacy-preserving generalized linear models, a fundamental and powerful framework underlying many prediction and classification procedures. We base our method on a distributed block coordinate descent algorithm to obtain parameter estimates, and we develop an extension to compute accurate standard errors without additional communication cost. We critically evaluate the information transfer for semi-honest collaborators and show that our protocol is secure against data reconstruction. Through both simulated and real-world examples we illustrate the functionality of our proposed algorithm. Without leaking information, our method performs as well on vertically partitioned data as existing methods on combined data -- all within mere minutes of computation time. We conclude that our method is a viable approach for vertically partitioned data analysis with a wide range of real-world applications.Comment: Fully reproducible code for all results and images can be found at https://github.com/vankesteren/privacy-preserving-glm, and the software package can be found at https://github.com/vankesteren/privre

    Strategies for including cloud-computing into an engineering modeling workflow

    Get PDF
    With the advent of cloud computing, high-end computing, networking, and storage resources are available on-demand at a relatively low price point. Internet applications in the consumer and increasingly in the enterprise space are making use of these resources to upgrade existing applications and build new ones. This is made possible by building decentralized applications that can be integrated with one another through web-enabled application programming interfaces (APIs). However, in the fields of engineering and computational science, cloud computing resources have been utilized primarily to augment existing high-performance computing hardware, but engineering model integrations still occur by the use of software libraries. In this research, a novel approach is proposed where engineering models are constructed as independent services that publish web-enabled APIs. To enable this, the engineering models are built as stateless microservices that solve a single computational problem. Composite services are then built utilizing these independent component models, much like in the consumer application space. Interactions between component models is orchestrated by a federation management system. This proposed approach is then demonstrated by disaggregating an existing monolithic model for a cookstove into a set of component models. The component models are then reintegrated and compared with the original model for computational accuracy and run-time. Additionally, a novel engineering workflow is proposed that reuses computational data by constructing reduced-order models (ROMs). This framework is evaluated empirically for a number of producers and consumers of engineering models based on computation and data synchronization aspects. The framework is also evaluated by simulating an engineering design workflow with multiple producers and consumers at various stages during the design process. Finally, concepts from the federated system of models and ROMs are combined to propose the concept of a hybrid model (information artefact). The hybrid model is a web-enabled microservice that encapsulates information from multiple engineering models at varying fidelities, and responds to queries based on the best available information. Rules for the construction of hybrid models have been proposed and evaluated in the context of engineering workflows

    Protecting privacy of users in brain-computer interface applications

    Get PDF
    Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG signals, including passwords, ATM PINs, and other intimate data. The challenge we address is how to engage in meaningful ML with EEG data while protecting the privacy of users. Hence, we propose cryptographic protocols based on secure multiparty computation (SMC) to perform linear regression over EEG signals from many users in a fully privacy-preserving(PP) fashion, i.e., such that each individual's EEG signals are not revealed to anyone else. To illustrate the potential of our secure framework, we show how it allows estimating the drowsiness of drivers from their EEG signals as would be possible in the unencrypted case, and at a very reasonable computational cost. Our solution is the first application of commodity-based SMC to EEG data, as well as the largest documented experiment of secret sharing-based SMC in general, namely, with 15 players involved in all the computations

    Healthcare Data Analytics and Privacy Preservation by DCNN Algorithm

    Get PDF
    Data has become an integral part of the digital world with the advancement in computing technologies. The collection of data is very crucial with regards to data analytics. Every industry makes use of data analytics ranging from financial to other commercial applications but it becomes even more important in healthcare domain for the analysis of healthcare data. The present research work is mainly focused on classification/prediction problems of healthcare data based on deep learning (supervised) approaches using data mining techniques. There is a need to design an intelligent model (based on deep learning) which can classify the amount of data that is stored in our databases. Human data analytical capability rate is much smaller when compared to the amount of data that is stored. This (classification) becomes even more critical when it comes to healthcare data as it can help to detect, diagnose and treat the patients based on these classified data. The main goal of the thesis is to develop a deep learning-based model for classification tasks and the introduced DDS can be used in healthcare domain to improve the diagnostic speed, accuracy and reliability

    Convolutional Monge Mapping Normalization for learning on sleep data

    Full text link
    In many machine learning applications on signals and biomedical data, especially electroencephalogram (EEG), one major challenge is the variability of the data across subjects, sessions, and hardware devices. In this work, we propose a new method called Convolutional Monge Mapping Normalization (CMMN), which consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. CMMN relies on novel closed-form solutions for optimal transport mappings and barycenters and provides individual test time adaptation to new data without needing to retrain a prediction model. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture when adapting between subjects, sessions, and even datasets collected with different hardware. Notably our performance gain is on par with much more numerically intensive Domain Adaptation (DA) methods and can be used in conjunction with those for even better performances
    • …
    corecore