1,894 research outputs found

    Reliability and Interpretability in Science and Deep Learning

    Get PDF
    In recent years, the question of the reliability of Machine Learning (ML) methods has acquired significant importance, and the analysis of the associated uncertainties has motivated a growing amount of research. However, most of these studies have applied standard error analysis to ML models---and in particular Deep Neural Network (DNN) models---which represent a rather significant departure from standard scientific modelling. It is therefore necessary to integrate the standard error analysis with a deeper epistemological analysis of the possible differences between DNN models and standard scientific modelling and the possible implications of these differences in the assessment of reliability. This article offers several contributions. First, it emphasises the ubiquitous role of model assumptions (both in ML and traditional Science) against the illusion of theory-free science. Secondly, model assumptions are analysed from the point of view of their (epistemic) complexity, which is shown to be language-independent. It is argued that the high epistemic complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress. Some potential ways forward are suggested. Thirdly, this article identifies the close relation between a model's epistemic complexity and its interpretability, as introduced in the context of responsible AI. This clarifies in which sense---and to what extent---the lack of understanding of a model (black-box problem) impacts its interpretability in a way that is independent of individual skills. It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone. This article focuses on the comparison between traditional scientific models and DNN models. However, Random Forest (RF) and Logistic Regression (LR) models are also briefly considered

    Colloidal patchy particle architectures:Simulations of accurate models in and out of equilibrium

    Get PDF
    Some material properties and functionalities arise from a collective organization mediated by non-covalent bonds between their constituent building blocks such as molecules or colloidal particles. Understanding such complex matter, specifically when these systems are out of equilibrium, remains a grand challenge.In this thesis, we use patchy colloidal particles interacting via critical Casimir interactions that can act as mesoscopic structural analogues of molecular, supramolecular and bio-inspired architectures. These particles can make directed bonds, follow Boltzmann statistics and are directly observable via e.g. confocal microscopy. By means of simulations, we give microscopic insight into the structural behaviors and responses of colloidal matter both in and out of equilibrium. First, we developed an accurate patchy particle potential in a hybrid bottom-up/top-down coarse-graining approach. Based on the particle’s geometry and the universal scaling theory, we benchmarked simulation outcomes onto experimental measurements of divalent patchy particles. As an alternative to explicit simulations, Wertheim's theory predicts the thermodynamic equilibrium of these systems. In chapter 3, we adapted Wertheim’s theory to accurately predict extremely confined systems as inspired by the effect of gravity on the patchy particle distributions. Finally, we investigated the effect of activity on the patchy particle architectures such as dimers, decamers, rings and networks. We find that the activity can enhance as well as reduce the stability of architectures, deform the intact structures, alter the mechanisms of fragmentation, and increase bond formation. In activated networks, we observe three distinct global structures upon increasing activity: a homogeneous, an inhomogeneous, and a phase separated structure

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    A First Course in Causal Inference

    Full text link
    I developed the lecture notes based on my ``Causal Inference'' course at the University of California Berkeley over the past seven years. Since half of the students were undergraduates, my lecture notes only require basic knowledge of probability theory, statistical inference, and linear and logistic regressions

    Binding processes in episodic memory: Measurement, structure, and moderators

    Full text link
    Episodic memory enables people to remember personally experienced events. While these events consist of different elements, people are able to form coherent memory representations. This requires that an event’s constituent elements are bound together in memory. Despite the importance of these binding processes for episodic memory, they are still only poorly understood and our abilities to measure them are limited. In this thesis, comprising three articles, I provide a new approach for measuring binding effects and use this measure to probe properties of binding processes in episodic memory. In the first article, I introduce the new measurement approach and evaluate its suitability for measuring binding effects in comparison to previous approaches. I show that the approach has good measurement properties and is better suited for measuring binding effects than previous approaches. In the second article, I examine the structure in which event elements are bound together and whether animacy influences binding processes. I show that different binding structures are possible, such as an integrated binding structure, in which event elements are bound into a unitary representation, and a hierarchical binding structure, in which event elements are preferentially bound to particular types of elements. These may lie on a continuum of memory representations with varying degrees of integration. I further show that the presence of an animate element in an event facilitates binding, enabling more coherent memory representations with a higher degree of integration. In addition, awareness regarding commonalities of types of event elements across events may facilitate binding. In the third article, I examine whether agency influences binding processes. I show that the presence of an agentic element in an event may facilitate binding, but evidence was not conclusive and effects may have been concealed due to low memory performance. Agency may thus underlie the previously found facilitating effect of animacy on binding, since animate elements may exert their influence by providing a potential agent in an event. One aim of my thesis is to provide a new tool for investigating binding processes in episodic memory. An additional aim is to extend our current understanding of binding structures that link together the elements of an event, as well as the factors that moderate binding processes. In doing so, I hope to advance our understanding of binding processes and enable and inform future exploration, as well as theory development and refinement, of this fundamental property underlying episodic memory

    DeepMem: ML Models as storage channels and their (mis-)applications

    Full text link
    Machine learning (ML) models are overparameterized to support generality and avoid overfitting. Prior works have shown that these additional parameters can be used for both malicious (e.g., hiding a model covertly within a trained model) and beneficial purposes (e.g., watermarking a model). In this paper, we propose a novel information theoretic perspective of the problem; we consider the ML model as a storage channel with a capacity that increases with overparameterization. Specifically, we consider a sender that embeds arbitrary information in the model at training time, which can be extracted by a receiver with a black-box access to the deployed model. We derive an upper bound on the capacity of the channel based on the number of available parameters. We then explore black-box write and read primitives that allow the attacker to: (i) store data in an optimized way within the model by augmenting the training data at the transmitter side, and (ii) to read it by querying the model after it is deployed. We also analyze the detectability of the writing primitive and consider a new version of the problem which takes information storage covertness into account. Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models

    Decision-making with gaussian processes: sampling strategies and monte carlo methods

    Get PDF
    We study Gaussian processes and their application to decision-making in the real world. We begin by reviewing the foundations of Bayesian decision theory and show how these ideas give rise to methods such as Bayesian optimization. We investigate practical techniques for carrying out these strategies, with an emphasis on estimating and maximizing acquisition functions. Finally, we introduce pathwise approaches to conditioning Gaussian processes and demonstrate key benefits for representing random variables in this manner.Open Acces

    Image Reconstruction via Deep Image Prior Subspaces

    Full text link
    Deep learning has been widely used for solving image reconstruction tasks but its deployability has been held back due to the shortage of high-quality training data. Unsupervised learning methods, such as the deep image prior (DIP), naturally fill this gap, but bring a host of new issues: the susceptibility to overfitting due to a lack of robust early stopping strategies and unstable convergence. We present a novel approach to tackle these issues by restricting DIP optimisation to a sparse linear subspace of its parameters, employing a synergy of dimensionality reduction techniques and second order optimisation methods. The low-dimensionality of the subspace reduces DIP's tendency to fit noise and allows the use of stable second order optimisation methods, e.g., natural gradient descent or L-BFGS. Experiments across both image restoration and tomographic tasks of different geometry and ill-posedness show that second order optimisation within a low-dimensional subspace is favourable in terms of optimisation stability to reconstruction fidelity trade-off

    Utilitarian Welfare Optimization in the Generalized Vertex Coloring Games: An Implication to Venue Selection in Events Planning

    Full text link
    We consider a general class of multi-agent games in networks, namely the generalized vertex coloring games (G-VCGs), inspired by real-life applications of the venue selection problem in events planning. Certain utility responding to the contemporary coloring assignment will be received by each agent under some particular mechanism, who, striving to maximize his own utility, is restricted to local information thus self-organizing when choosing another color. Our focus is on maximizing some utilitarian-looking welfare objective function concerning the cumulative utilities across the network in a decentralized fashion. Firstly, we investigate on a special class of the G-VCGs, namely Identical Preference VCGs (IP-VCGs) which recovers the rudimentary work by \cite{chaudhuri2008network}. We reveal its convergence even under a completely greedy policy and completely synchronous settings, with a stochastic bound on the converging rate provided. Secondly, regarding the general G-VCGs, a greediness-preserved Metropolis-Hasting based policy is proposed for each agent to initiate with the limited information and its optimality under asynchronous settings is proved using theories from the regular perturbed Markov processes. The policy was also empirically witnessed to be robust under independently synchronous settings. Thirdly, in the spirit of ``robust coloring'', we include an expected loss term in our objective function to balance between the utilities and robustness. An optimal coloring for this robust welfare optimization would be derived through a second-stage MH-policy driven algorithm. Simulation experiments are given to showcase the efficiency of our proposed strategy.Comment: 35 Page

    Thermodynamic AI and the fluctuation frontier

    Full text link
    Many Artificial Intelligence (AI) algorithms are inspired by physics and employ stochastic fluctuations. We connect these physics-inspired AI algorithms by unifying them under a single mathematical framework that we call Thermodynamic AI. Seemingly disparate algorithmic classes can be described by this framework, for example, (1) Generative diffusion models, (2) Bayesian neural networks, (3) Monte Carlo sampling and (4) Simulated annealing. Such Thermodynamic AI algorithms are currently run on digital hardware, ultimately limiting their scalability and overall potential. Stochastic fluctuations naturally occur in physical thermodynamic systems, and such fluctuations can be viewed as a computational resource. Hence, we propose a novel computing paradigm, where software and hardware become inseparable. Our algorithmic unification allows us to identify a single full-stack paradigm, involving Thermodynamic AI hardware, that could accelerate such algorithms. We contrast Thermodynamic AI hardware with quantum computing where noise is a roadblock rather than a resource. Thermodynamic AI hardware can be viewed as a novel form of computing, since it uses a novel fundamental building block. We identify stochastic bits (s-bits) and stochastic modes (s-modes) as the respective building blocks for discrete and continuous Thermodynamic AI hardware. In addition to these stochastic units, Thermodynamic AI hardware employs a Maxwell's demon device that guides the system to produce non-trivial states. We provide a few simple physical architectures for building these devices and we develop a formalism for programming the hardware via gate sequences. We hope to stimulate discussion around this new computing paradigm. Beyond acceleration, we believe it will impact the design of both hardware and algorithms, while also deepening our understanding of the connection between physics and intelligence.Comment: 47 pages, 18 figures, Added relevant reference
    • …
    corecore