26 research outputs found

    Composable Probabilistic Inference with Blaise

    Get PDF
    Probabilistic inference provides a unified, systematic framework for specifying and solving these problems. Recent work has demonstrated the great value of probabilistic models defined over complex, structured domains. However, our ability to imagine probabilistic models has far outstripped our ability to programmatically manipulate them and to effectively implement inference, limiting the complexity of the problems that we can solve in practice.This thesis presents Blaise, a novel framework for composable probabilistic modeling and inference, designed to address these limitations. Blaise has three components: * The Blaise State-Density-Kernel (SDK) graphical modeling language that generalizes factor graphs by: (1) explicitly representing inference algorithms (and their locality) using a new type of graph node, (2) representing hierarchical composition and repeated substructures in the state space, the interest distribution, and the inference procedure, and (3) permitting the structure of the model to change during algorithm execution. * A suite of SDK graph transformations that may be used to extend a model (e.g. to construct a mixture model from a model of a mixture component), or to make inference more effective (e.g. by automatically constructing a parallel tempered version of an algorithm or by exploiting conjugacy in a model). * The Blaise Virtual Machine, a runtime environment that can efficiently execute the stochastic automata represented by Blaise SDK graphs. Blaise encourages the construction of sophisticated models by composing simpler models, allowing the designer to implement and verify small portions of the model and inference method, and to reuse model components from one task to another. Blaise decouples the implementation of the inference algorithm from the specification of the interest distribution, even in cases (such as Gibbs sampling) where the shape of the interest distribution guides the inference. This gives modelers the freedom to explore alternate models without slow, error-prone reimplementation. The compositional nature of Blaise enables novel reinterpretations of advanced Monte Carlo inference techniques (such as parallel tempering) as simple transformations of Blaise SDK graphs.In this thesis, I describe each of the components of the Blaise modeling framework, as well as validating the Blaise framework by highlighting a variety of contemporary sophisticated models that have been developed by the Blaise user community. I also present several surprising findings stemming from the Blaise modeling framework, including that an Infinite Relational Model can be built using exactly the same inference methods as a simple mixture model, that constructing a parallel tempered inference algorithm should be a point-and-click/one-line-of-code operation, and that Markov chain Monte Carlo for probabilistic models with complicated long-distance dependencies, such as a stochastic version of Scheme, can be managed using standard Blaise mechanisms

    Composable probabilistic inference with BLAISE

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 185-190).If we are to understand human-level cognition, we must understand how the mind finds the patterns that underlie the incomplete, noisy, and ambiguous data from our senses and that allow us to generalize our experiences to new situations. A wide variety of commercial applications face similar issues: industries from health services to business intelligence to oil field exploration critically depend on their ability to find patterns in vast amounts of data and use those patterns to make accurate predictions. Probabilistic inference provides a unified, systematic framework for specifying and solving these problems. Recent work has demonstrated the great value of probabilistic models defined over complex, structured domains. However, our ability to imagine probabilistic models has far outstripped our ability to programmatically manipulate them and to effectively implement inference, limiting the complexity of the problems that we can solve in practice. This thesis presents BLAISE, a novel framework for composable probabilistic modeling and inference, designed to address these limitations. BLAISE has three components: * The BLAISE State-Density-Kernel (SDK) graphical modeling language that generalizes factor graphs by: (1) explicitly representing inference algorithms (and their locality) using a new type of graph node, (2) representing hierarchical composition and repeated substructures in the state space, the interest distribution, and the inference procedure, and (3) permitting the structure of the model to change during algorithm execution. * A suite of SDK graph transformations that may be used to extend a model (e.g. to construct a mixture model from a model of a mixture component), or to make inference more effective (e.g. by automatically constructing a parallel tempered version of an algorithm or by exploiting conjugacy in a model).(cont.) * The BLAISE Virtual Machine, a runtime environment that can efficiently execute the stochastic automata represented by BLAISE SDK graphs. BLAISE encourages the construction of sophisticated models by composing simpler models, allowing the designer to implement and verify small portions of the model and inference method, and to reuse mode components from one task to another. BLAISE decouples the implementation of the inference algorithm from the specification of the interest distribution, even in cases (such as Gibbs sampling) where the shape of the interest distribution guides the inference. This gives modelers the freedom to explore alternate models without slow, error-prone reimplementation. The compositional nature of BLAISE enables novel reinterpretations of advanced Monte Carlo inference techniques (such as parallel tempering) as simple transformations of BLAISE SDK graphs. In this thesis, I describe each of the components of the BLAISE modeling framework, as well as validating the BLAISE framework by highlighting a variety of contemporary sophisticated models that have been developed by the BLAISE user community. I also present several surprising findings stemming from the BLAISE modeling framework, including that an Infinite Relational Model can be built using exactly the same inference methods as a simple mixture model, that constructing a parallel tempered inference algorithm should be a point-and-click/one-line-of-code operation, and that Markov chain Monte Carlo for probabilistic models with complicated long-distance dependencies, such as a stochastic version of Scheme, can be managed using standard BLAISE mechanisms.by Keith Allen Bonawitz.Ph.D

    Discrete Distribution Estimation under Local Privacy

    Get PDF
    Abstract The collection and analysis of user data drives improvements in the app and web ecosystems, but comes with risks to privacy. This paper examines discrete distribution estimation under local privacy, a setting wherein service providers can learn the distribution of a categorical statistic of interest without collecting the underlying data. We present new mechanisms, including hashed k-ary Randomized Response (k-RR), that empirically meet or exceed the utility of existing mechanisms at all privacy levels. New theoretical results demonstrate the order-optimality of k-RR and the existing RAPPOR mechanism at different privacy regimes

    Practical Secure Aggregation for Privacy Preserving Machine Learning

    Get PDF
    We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to compute the sum of large, user-held data vectors from mobile devices in a secure manner (i.e. without learning each user\u27s individual contribution), and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and malicious settings, and show that security is maintained even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers 1.73×1.73\times communication expansion for 2102^{10} users and 2202^{20}-dimensional vectors, and 1.98×1.98\times expansion for 2142^{14} users and 2242^{24}-dimensional vectors over sending data in the clear

    Secure Single-Server Aggregation with (Poly)Logarithmic Overhead

    Get PDF
    Secure aggregation is a cryptographic primitive that enables a server to learn the sum of the vector inputs of many clients. Bonawitz et al. (CCS 2017) presented a construction that incurs computation and communication for each client linear in the number of parties. While this functionality enables a broad range of privacy preserving computational tasks, scaling concerns limit its scope of use. We present the first constructions for secure aggregation that achieve polylogarithmic communication and computation per client. Our constructions provide security in the semi-honest and the semi-malicious setting where the adversary controls the server and a γ\gamma-fraction of the clients, and correctness with up to δ\delta-fraction dropouts among the clients. Our constructions show how to replace the complete communication graph of Bonawitz et al., which entails the linear overheads, with a kk-regular graph of logarithmic degree while maintaining the security guarantees. Beyond improving the known asymptotics for secure aggregation, our constructions also achieve very efficient concrete parameters. The semi-honest secure aggregation can handle a billion clients at the per client cost of the protocol of Bonawitz et al. for a thousand clients. In the semi-malicious setting with 10410^4 clients, each client needs to communicate only with 3%3\% of the clients to have a guarantee that its input has been added together with the inputs of at least 50005000 other clients, while withstanding up to 5%5\% corrupt clients and 5%5\% dropouts. We also show an application of secure aggregation to the task of secure shuffling which enables the first cryptographically secure instantiation of the shuffle model of differential privacy
    corecore