1,894 research outputs found
Reliability and Interpretability in Science and Deep Learning
In recent years, the question of the reliability of Machine Learning (ML) methods has acquired significant importance, and the analysis of the associated uncertainties has motivated a growing amount of research. However, most of these studies have applied standard error analysis to ML models---and in particular Deep Neural Network (DNN) models---which represent a rather significant departure from standard scientific modelling. It is therefore necessary to integrate the standard error analysis with a deeper epistemological analysis of the possible differences between DNN models and standard scientific modelling and the possible implications of these differences in the assessment of reliability. This article offers several contributions. First, it emphasises the ubiquitous role of model assumptions (both in ML and traditional Science) against the illusion of theory-free science. Secondly, model assumptions are analysed from the point of view of their (epistemic) complexity, which is shown to be language-independent. It is argued that the high epistemic complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress. Some potential ways forward are suggested. Thirdly, this article identifies the close relation between a model's epistemic complexity and its interpretability, as introduced in the context of responsible AI. This clarifies in which sense---and to what extent---the lack of understanding of a model (black-box problem) impacts its interpretability in a way that is independent of individual skills. It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone. This article focuses on the comparison between traditional scientific models and DNN models. However, Random Forest (RF) and Logistic Regression (LR) models are also briefly considered
Colloidal patchy particle architectures:Simulations of accurate models in and out of equilibrium
Some material properties and functionalities arise from a collective organization mediated by non-covalent bonds between their constituent building blocks such as molecules or colloidal particles. Understanding such complex matter, specifically when these systems are out of equilibrium, remains a grand challenge.In this thesis, we use patchy colloidal particles interacting via critical Casimir interactions that can act as mesoscopic structural analogues of molecular, supramolecular and bio-inspired architectures. These particles can make directed bonds, follow Boltzmann statistics and are directly observable via e.g. confocal microscopy. By means of simulations, we give microscopic insight into the structural behaviors and responses of colloidal matter both in and out of equilibrium. First, we developed an accurate patchy particle potential in a hybrid bottom-up/top-down coarse-graining approach. Based on the particle’s geometry and the universal scaling theory, we benchmarked simulation outcomes onto experimental measurements of divalent patchy particles. As an alternative to explicit simulations, Wertheim's theory predicts the thermodynamic equilibrium of these systems. In chapter 3, we adapted Wertheim’s theory to accurately predict extremely confined systems as inspired by the effect of gravity on the patchy particle distributions. Finally, we investigated the effect of activity on the patchy particle architectures such as dimers, decamers, rings and networks. We find that the activity can enhance as well as reduce the stability of architectures, deform the intact structures, alter the mechanisms of fragmentation, and increase bond formation. In activated networks, we observe three distinct global structures upon increasing activity: a homogeneous, an inhomogeneous, and a phase separated structure
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
A First Course in Causal Inference
I developed the lecture notes based on my ``Causal Inference'' course at the
University of California Berkeley over the past seven years. Since half of the
students were undergraduates, my lecture notes only require basic knowledge of
probability theory, statistical inference, and linear and logistic regressions
Binding processes in episodic memory: Measurement, structure, and moderators
Episodic memory enables people to remember personally experienced events. While these events consist of different elements, people are able to form coherent memory representations. This requires that an event’s constituent elements are bound together in memory. Despite the importance of these binding processes for episodic memory, they are still only poorly understood and our abilities to measure them are limited. In this thesis, comprising three articles, I provide a new approach for measuring binding effects and use this measure to probe properties of binding processes in episodic memory. In the first article, I introduce the new measurement approach and evaluate its suitability for measuring binding effects in comparison to previous approaches. I show that the approach has good measurement properties and is better suited for measuring binding effects than previous approaches. In the second article, I examine the structure in which event elements are bound together and whether animacy influences binding processes. I show that different binding structures are possible, such as an integrated binding structure, in which event elements are bound into a unitary representation, and a hierarchical binding structure, in which event elements are preferentially bound to particular types of elements. These may lie on a continuum of memory representations with varying degrees of integration. I further show that the presence of an animate element in an event facilitates binding, enabling more coherent memory representations with a higher degree of integration. In addition, awareness regarding commonalities of types of event elements across events may facilitate binding. In the third article, I examine whether agency influences binding processes. I show that the presence of an agentic element in an event may facilitate binding, but evidence was not conclusive and effects may have been concealed due to low memory performance. Agency may thus underlie the previously found facilitating effect of animacy on binding, since animate elements may exert their influence by providing a potential agent in an event. One aim of my thesis is to provide a new tool for investigating binding processes in episodic memory. An additional aim is to extend our current understanding of binding structures that link together the elements of an event, as well as the factors that moderate binding processes. In doing so, I hope to advance our understanding of binding processes and enable and inform future exploration, as well as theory development and refinement, of this fundamental property underlying episodic memory
DeepMem: ML Models as storage channels and their (mis-)applications
Machine learning (ML) models are overparameterized to support generality and
avoid overfitting. Prior works have shown that these additional parameters can
be used for both malicious (e.g., hiding a model covertly within a trained
model) and beneficial purposes (e.g., watermarking a model). In this paper, we
propose a novel information theoretic perspective of the problem; we consider
the ML model as a storage channel with a capacity that increases with
overparameterization. Specifically, we consider a sender that embeds arbitrary
information in the model at training time, which can be extracted by a receiver
with a black-box access to the deployed model. We derive an upper bound on the
capacity of the channel based on the number of available parameters. We then
explore black-box write and read primitives that allow the attacker to: (i)
store data in an optimized way within the model by augmenting the training data
at the transmitter side, and (ii) to read it by querying the model after it is
deployed. We also analyze the detectability of the writing primitive and
consider a new version of the problem which takes information storage
covertness into account. Specifically, to obtain storage covertness, we
introduce a new constraint such that the data augmentation used for the write
primitives minimizes the distribution shift with the initial (baseline task)
distribution. This constraint introduces a level of "interference" with the
initial task, thereby limiting the channel's effective capacity. Therefore, we
develop optimizations to improve the capacity in this case, including a novel
ML-specific substitution based error correction protocol. We believe that the
proposed modeling of the problem offers new tools to better understand and
mitigate potential vulnerabilities of ML, especially in the context of
increasingly large models
Decision-making with gaussian processes: sampling strategies and monte carlo methods
We study Gaussian processes and their application to decision-making in the real world. We begin by reviewing the foundations of Bayesian decision theory and show how these ideas give rise to methods such as Bayesian optimization. We investigate practical techniques for carrying out these strategies, with an emphasis on estimating and maximizing acquisition functions. Finally, we introduce pathwise approaches to conditioning Gaussian processes and demonstrate key benefits for representing random variables in this manner.Open Acces
Image Reconstruction via Deep Image Prior Subspaces
Deep learning has been widely used for solving image reconstruction tasks but
its deployability has been held back due to the shortage of high-quality
training data. Unsupervised learning methods, such as the deep image prior
(DIP), naturally fill this gap, but bring a host of new issues: the
susceptibility to overfitting due to a lack of robust early stopping strategies
and unstable convergence. We present a novel approach to tackle these issues by
restricting DIP optimisation to a sparse linear subspace of its parameters,
employing a synergy of dimensionality reduction techniques and second order
optimisation methods. The low-dimensionality of the subspace reduces DIP's
tendency to fit noise and allows the use of stable second order optimisation
methods, e.g., natural gradient descent or L-BFGS. Experiments across both
image restoration and tomographic tasks of different geometry and ill-posedness
show that second order optimisation within a low-dimensional subspace is
favourable in terms of optimisation stability to reconstruction fidelity
trade-off
Utilitarian Welfare Optimization in the Generalized Vertex Coloring Games: An Implication to Venue Selection in Events Planning
We consider a general class of multi-agent games in networks, namely the
generalized vertex coloring games (G-VCGs), inspired by real-life applications
of the venue selection problem in events planning. Certain utility responding
to the contemporary coloring assignment will be received by each agent under
some particular mechanism, who, striving to maximize his own utility, is
restricted to local information thus self-organizing when choosing another
color. Our focus is on maximizing some utilitarian-looking welfare objective
function concerning the cumulative utilities across the network in a
decentralized fashion. Firstly, we investigate on a special class of the
G-VCGs, namely Identical Preference VCGs (IP-VCGs) which recovers the
rudimentary work by \cite{chaudhuri2008network}. We reveal its convergence even
under a completely greedy policy and completely synchronous settings, with a
stochastic bound on the converging rate provided. Secondly, regarding the
general G-VCGs, a greediness-preserved Metropolis-Hasting based policy is
proposed for each agent to initiate with the limited information and its
optimality under asynchronous settings is proved using theories from the
regular perturbed Markov processes. The policy was also empirically witnessed
to be robust under independently synchronous settings. Thirdly, in the spirit
of ``robust coloring'', we include an expected loss term in our objective
function to balance between the utilities and robustness. An optimal coloring
for this robust welfare optimization would be derived through a second-stage
MH-policy driven algorithm. Simulation experiments are given to showcase the
efficiency of our proposed strategy.Comment: 35 Page
Thermodynamic AI and the fluctuation frontier
Many Artificial Intelligence (AI) algorithms are inspired by physics and
employ stochastic fluctuations. We connect these physics-inspired AI algorithms
by unifying them under a single mathematical framework that we call
Thermodynamic AI. Seemingly disparate algorithmic classes can be described by
this framework, for example, (1) Generative diffusion models, (2) Bayesian
neural networks, (3) Monte Carlo sampling and (4) Simulated annealing. Such
Thermodynamic AI algorithms are currently run on digital hardware, ultimately
limiting their scalability and overall potential. Stochastic fluctuations
naturally occur in physical thermodynamic systems, and such fluctuations can be
viewed as a computational resource. Hence, we propose a novel computing
paradigm, where software and hardware become inseparable. Our algorithmic
unification allows us to identify a single full-stack paradigm, involving
Thermodynamic AI hardware, that could accelerate such algorithms. We contrast
Thermodynamic AI hardware with quantum computing where noise is a roadblock
rather than a resource. Thermodynamic AI hardware can be viewed as a novel form
of computing, since it uses a novel fundamental building block. We identify
stochastic bits (s-bits) and stochastic modes (s-modes) as the respective
building blocks for discrete and continuous Thermodynamic AI hardware. In
addition to these stochastic units, Thermodynamic AI hardware employs a
Maxwell's demon device that guides the system to produce non-trivial states. We
provide a few simple physical architectures for building these devices and we
develop a formalism for programming the hardware via gate sequences. We hope to
stimulate discussion around this new computing paradigm. Beyond acceleration,
we believe it will impact the design of both hardware and algorithms, while
also deepening our understanding of the connection between physics and
intelligence.Comment: 47 pages, 18 figures, Added relevant reference
- …