660,822 research outputs found

    Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning

    Full text link
    Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.Comment: 2019 IEEE Symposium on Security and Privacy (SP

    White learning methodology: a case study of cancer-related disease factors analysis in real-time PACS environment

    Get PDF
    Bayesian network is a probabilistic model of which the prediction accuracy may not be one of the highest in the machine learning family. Deep learning (DL) on the other hand possess of higher predictive power than many other models. How reliable the result is, how it is deduced, how interpretable the prediction by DL mean to users, remain obscure. DL functions like a black box. As a result, many medical practitioners are reductant to use deep learning as the only tool for critical machine learning application, such as aiding tool for cancer diagnosis. In this paper, a framework of white learning is being proposed which takes advantages of both black box learning and white box learning. Usually, black box learning will give a high standard of accuracy and white box learning will provide an explainable direct acyclic graph. According to our design, there are 3 stages of White Learning, loosely coupled WL, semi coupled WL and tightly coupled WL based on degree of fusion of the white box learning and black box learning. In our design, a case of loosely coupled WL is tested on breast cancer dataset. This approach uses deep learning and an incremental version of Naïve Bayes network. White learning is largely defied as a systemic fusion of machine learning models which result in an explainable Bayes network which could find out the hidden relations between features and class and deep learning which would give a higher accuracy of prediction than other algorithms. We designed a series of experiments for this loosely coupled WL model. The simulation results show that using WL compared to standard black-box deep learning, the levels of accuracy and kappa statistics could be enhanced up to 50%. The performance of WL seems more stable too in extreme conditions such as noise and high dimensional data. The relations by Bayesian network of WL are more concise and stronger in affinity too. The experiments results deliver positive signals that WL is possible to output both high classification accuracy and explainable relations graph between features and class. [Abstract copyright: Copyright © 2020. Published by Elsevier B.V.

    Improving the expressiveness of black-box models for predicting student performance

    Get PDF
    Early prediction systems of student performance can be very useful to guide student learning. For a prediction model to be really useful as an effective aid for learning, it must provide tools to adequately interpret progress, to detect trends and behaviour patterns and to identify the causes of learning problems. White-box and black-box techniques have been described in literature to implement prediction models. White-box techniques require a priori models to explore, which make them easy to interpret but difficult to be generalized and unable to detect unexpected relationships between data. Black-box techniques are easier to generalize and suitable to discover unsuspected relationships but they are cryptic and difficult to be interpreted for most teachers. In this paper a black-box technique is proposed to take advantage of the power and versatility of these methods, while making some decisions about the input data and design of the classifier that provide a rich output data set. A set of graphical tools is also proposed to exploit the output information and provide a meaningful guide to teachers and students. From our experience, a set of tips about how to design a prediction system and the representation of the output information is also provided

    Learning optimization models in the presence of unknown relations

    Full text link
    In a sequential auction with multiple bidding agents, it is highly challenging to determine the ordering of the items to sell in order to maximize the revenue due to the fact that the autonomy and private information of the agents heavily influence the outcome of the auction. The main contribution of this paper is two-fold. First, we demonstrate how to apply machine learning techniques to solve the optimal ordering problem in sequential auctions. We learn regression models from historical auctions, which are subsequently used to predict the expected value of orderings for new auctions. Given the learned models, we propose two types of optimization methods: a black-box best-first search approach, and a novel white-box approach that maps learned models to integer linear programs (ILP) which can then be solved by any ILP-solver. Although the studied auction design problem is hard, our proposed optimization methods obtain good orderings with high revenues. Our second main contribution is the insight that the internal structure of regression models can be efficiently evaluated inside an ILP solver for optimization purposes. To this end, we provide efficient encodings of regression trees and linear regression models as ILP constraints. This new way of using learned models for optimization is promising. As the experimental results show, it significantly outperforms the black-box best-first search in nearly all settings.Comment: 37 pages. Working pape

    Experiment Design and Training Data Quality of Inverse Model for Short-term Building Energy Forecasting

    Get PDF
    For data-driven building energy forecasting modeling, the quality of training data strongly affects a model’s accuracy and cost-effectiveness. In order to obtain high-quality training data within a short time period, experiment design, active learning, or excitation is becoming increasingly important, especially for nonlinear systems such as building energy systems. Experiment design and system excitation have been widely studied and applied in fields such as robotics and automobile industry for their model development. But these methods have hardly been applied for building energy modeling. This paper presents an overall discussion on the topic of applying system excitation for developing building energy forecasting models. For gray-box and white-box models, a model’s physical representations and theories can be applied to guide their training data collections. However, for black-box (pure-data-driven) models, the training data’s quality is sensitive to the model structure, leading to a fact that there is no universal theory for data training.  The focus of black-box modeling has traditionally been on how to represent a data set well. The impact of how such a data set represents the real system and how the quality of a training data set affect the performances of black-box models have not been well studied. In this paper, the system excitation method, which is used in system identification area, is used to excite zone temperature set-points to generate training data. These training data from system excitation are then used to train a variety of black-box building energy forecasting models. The models’ performances (accuracy and extendibility) are compared among different model structures. For the same model structure, its performances are also compared between when it is trained using typical building operational data and when it is trained using exited training data. Results show that the black-box models trained by normal operation data achieve better performance than that trained by excited training data but have worse model extendibility; Training data obtained from excitation will help to improve performances of system identification models

    Machine Learning Models that Remember Too Much

    Full text link
    Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data. We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized information from the model. We evaluate our techniques on standard ML tasks for image classification (CIFAR10), face recognition (LFW and FaceScrub), and text analysis (20 Newsgroups and IMDB). In all cases, we show how our algorithms create models that have high predictive power yet allow accurate extraction of subsets of their training data

    An information presentation method based on tree-like super entity component

    Full text link
    Information systems are increasingly oriented in the direction of large-scale integration due to the explosion of multi-source information. It is therefore important to discuss how to reasonably organize and present information from multiple structures and sources on the same information system platform. In this study, we propose a 3C (Components, Connections, Container) component model by combining white-box and black-box methods, design a tree-like super entity based on the model, present its construction and related algorithm, and take a tree-like super entity as the information organization method for multi-level entities. In order to represent structural, semi-structural and non-structural data on the same information system platform, an information presentation method based on an editable e-book component has been developed by combining the tree-like super entity component, QQ-style menu and 1/K switch connection component, which has been successfully applied in the Flood Protection Project Information System of the Yangtze River in China. © 2011 Elsevier Inc
    corecore