13,103 research outputs found
Differentially private partitioned variational inference
Learning a privacy-preserving model from sensitive data which are distributed
across multiple devices is an increasingly important problem. The problem is
often formulated in the federated learning context, with the aim of learning a
single global model while keeping the data distributed. Moreover, Bayesian
learning is a popular approach for modelling, since it naturally supports
reliable uncertainty estimates. However, Bayesian learning is generally
intractable even with centralised non-private data and so approximation
techniques such as variational inference are a necessity. Variational inference
has recently been extended to the non-private federated learning setting via
the partitioned variational inference algorithm. For privacy protection, the
current gold standard is called differential privacy. Differential privacy
guarantees privacy in a strong, mathematically clearly defined sense.
In this paper, we present differentially private partitioned variational
inference, the first general framework for learning a variational approximation
to a Bayesian posterior distribution in the federated learning setting while
minimising the number of communication rounds and providing differential
privacy guarantees for data subjects.
We propose three alternative implementations in the general framework, one
based on perturbing local optimisation runs done by individual parties, and two
based on perturbing updates to the global model (one using a version of
federated averaging, the second one adding virtual parties to the protocol),
and compare their properties both theoretically and empirically.Comment: Published in TMLR 04/2023: https://openreview.net/forum?id=55Bcghgic
Loss minimization yields multicalibration for large neural networks
Multicalibration is a notion of fairness that aims to provide accurate
predictions across a large set of groups. Multicalibration is known to be a
different goal than loss minimization, even for simple predictors such as
linear functions. In this note, we show that for (almost all) large neural
network sizes, optimally minimizing squared error leads to multicalibration.
Our results are about representational aspects of neural networks, and not
about algorithmic or sample complexity considerations. Previous such results
were known only for predictors that were nearly Bayes-optimal and were
therefore representation independent. We emphasize that our results do not
apply to specific algorithms for optimizing neural networks, such as SGD, and
they should not be interpreted as "fairness comes for free from optimizing
neural networks"
Ambiguous Medical Image Segmentation using Diffusion Models
Collective insights from a group of experts have always proven to outperform
an individual's best diagnostic for clinical tasks. For the task of medical
image segmentation, existing research on AI-based alternatives focuses more on
developing models that can imitate the best individual rather than harnessing
the power of expert groups. In this paper, we introduce a single diffusion
model-based approach that produces multiple plausible outputs by learning a
distribution over group insights. Our proposed model generates a distribution
of segmentation masks by leveraging the inherent stochastic sampling process of
diffusion using only minimal additional learning. We demonstrate on three
different medical image modalities- CT, ultrasound, and MRI that our model is
capable of producing several possible variants while capturing the frequencies
of their occurrences. Comprehensive results show that our proposed approach
outperforms existing state-of-the-art ambiguous segmentation networks in terms
of accuracy while preserving naturally occurring variation. We also propose a
new metric to evaluate the diversity as well as the accuracy of segmentation
predictions that aligns with the interest of clinical practice of collective
insights
In-situ crack and keyhole pore detection in laser directed energy deposition through acoustic signal and deep learning
Cracks and keyhole pores are detrimental defects in alloys produced by laser
directed energy deposition (LDED). Laser-material interaction sound may hold
information about underlying complex physical events such as crack propagation
and pores formation. However, due to the noisy environment and intricate signal
content, acoustic-based monitoring in LDED has received little attention. This
paper proposes a novel acoustic-based in-situ defect detection strategy in
LDED. The key contribution of this study is to develop an in-situ acoustic
signal denoising, feature extraction, and sound classification pipeline that
incorporates convolutional neural networks (CNN) for online defect prediction.
Microscope images are used to identify locations of the cracks and keyhole
pores within a part. The defect locations are spatiotemporally registered with
acoustic signal. Various acoustic features corresponding to defect-free
regions, cracks, and keyhole pores are extracted and analysed in time-domain,
frequency-domain, and time-frequency representations. The CNN model is trained
to predict defect occurrences using the Mel-Frequency Cepstral Coefficients
(MFCCs) of the lasermaterial interaction sound. The CNN model is compared to
various classic machine learning models trained on the denoised acoustic
dataset and raw acoustic dataset. The validation results shows that the CNN
model trained on the denoised dataset outperforms others with the highest
overall accuracy (89%), keyhole pore prediction accuracy (93%), and AUC-ROC
score (98%). Furthermore, the trained CNN model can be deployed into an
in-house developed software platform for online quality monitoring. The
proposed strategy is the first study to use acoustic signals with deep learning
for insitu defect detection in LDED process.Comment: 36 Pages, 16 Figures, accepted at journal Additive Manufacturin
Bayesian networks for disease diagnosis: What are they, who has used them and how?
A Bayesian network (BN) is a probabilistic graph based on Bayes' theorem,
used to show dependencies or cause-and-effect relationships between variables.
They are widely applied in diagnostic processes since they allow the
incorporation of medical knowledge to the model while expressing uncertainty in
terms of probability. This systematic review presents the state of the art in
the applications of BNs in medicine in general and in the diagnosis and
prognosis of diseases in particular. Indexed articles from the last 40 years
were included. The studies generally used the typical measures of diagnostic
and prognostic accuracy: sensitivity, specificity, accuracy, precision, and the
area under the ROC curve. Overall, we found that disease diagnosis and
prognosis based on BNs can be successfully used to model complex medical
problems that require reasoning under conditions of uncertainty.Comment: 22 pages, 5 figures, 1 table, Student PhD first pape
Offline and Online Models for Learning Pairwise Relations in Data
Pairwise relations between data points are essential for numerous machine learning algorithms. Many representation learning methods consider pairwise relations to identify the latent features and patterns in the data. This thesis, investigates learning of pairwise relations from two different perspectives: offline learning and online learning.The first part of the thesis focuses on offline learning by starting with an investigation of the performance modeling of a synchronization method in concurrent programming using a Markov chain whose state transition matrix models pairwise relations between involved cores in a computer process.Then the thesis focuses on a particular pairwise distance measure, the minimax distance, and explores memory-efficient approaches to computing this distance by proposing a hierarchical representation of the data with a linear memory requirement with respect to the number of data points, from which the exact pairwise minimax distances can be derived in a memory-efficient manner. Then, a memory-efficient sampling method is proposed that follows the aforementioned hierarchical representation of the data and samples the data points in a way that the minimax distances between all data points are maximally preserved. Finally, the thesis proposes a practical non-parametric clustering of vehicle motion trajectories to annotate traffic scenarios based on transitive relations between trajectories in an embedded space.The second part of the thesis takes an online learning perspective, and starts by presenting an online learning method for identifying bottlenecks in a road network by extracting the minimax path, where bottlenecks are considered as road segments with the highest cost, e.g., in the sense of travel time. Inspired by real-world road networks, the thesis assumes a stochastic traffic environment in which the road-specific probability distribution of travel time is unknown. Therefore, it needs to learn the parameters of the probability distribution through observations by modeling the bottleneck identification task as a combinatorial semi-bandit problem. The proposed approach takes into account the prior knowledge and follows a Bayesian approach to update the parameters. Moreover, it develops a combinatorial variant of Thompson Sampling and derives an upper bound for the corresponding Bayesian regret. Furthermore, the thesis proposes an approximate algorithm to address the respective computational intractability issue.Finally, the thesis considers contextual information of road network segments by extending the proposed model to a contextual combinatorial semi-bandit framework and investigates and develops various algorithms for this contextual combinatorial setting
Advancing Model Pruning via Bi-level Optimization
The deployment constraints in practical applications necessitate the pruning
of large-scale deep learning models, i.e., promoting their weight sparsity. As
illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the
potential of improving their generalization ability. At the core of LTH,
iterative magnitude pruning (IMP) is the predominant pruning method to
successfully find 'winning tickets'. Yet, the computation cost of IMP grows
prohibitively as the targeted pruning ratio increases. To reduce the
computation overhead, various efficient 'one-shot' pruning methods have been
developed, but these schemes are usually unable to find winning tickets as good
as IMP. This raises the question of how to close the gap between pruning
accuracy and pruning efficiency? To tackle it, we pursue the algorithmic
advancement of model pruning. Specifically, we formulate the pruning problem
from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the
BLO interpretation provides a technically-grounded optimization base for an
efficient implementation of the pruning-retraining learning paradigm used in
IMP. We also show that the proposed bi-level optimization-oriented pruning
method (termed BiP) is a special class of BLO problems with a bi-linear problem
structure. By leveraging such bi-linearity, we theoretically show that BiP can
be solved as easily as first-order optimization, thus inheriting the
computation efficiency. Through extensive experiments on both structured and
unstructured pruning with 5 model architectures and 4 data sets, we demonstrate
that BiP can find better winning tickets than IMP in most cases, and is
computationally as efficient as the one-shot pruning schemes, demonstrating 2-7
times speedup over IMP for the same level of model accuracy and sparsity.Comment: Thirty-sixth Conference on Neural Information Processing Systems
(NeurIPS 2022
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
Information-Theoretic GAN Compression with Variational Energy-based Model
We propose an information-theoretic knowledge distillation approach for the
compression of generative adversarial networks, which aims to maximize the
mutual information between teacher and student networks via a variational
optimization based on an energy-based model. Because the direct computation of
the mutual information in continuous domains is intractable, our approach
alternatively optimizes the student network by maximizing the variational lower
bound of the mutual information. To achieve a tight lower bound, we introduce
an energy-based model relying on a deep neural network to represent a flexible
variational distribution that deals with high-dimensional images and consider
spatial dependencies between pixels, effectively. Since the proposed method is
a generic optimization algorithm, it can be conveniently incorporated into
arbitrary generative adversarial networks and even dense prediction networks,
e.g., image enhancement models. We demonstrate that the proposed algorithm
achieves outstanding performance in model compression of generative adversarial
networks consistently when combined with several existing models.Comment: Accepted at Neurips202
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model
We consider dynamic pricing strategies in a streamed longitudinal data set-up
where the objective is to maximize, over time, the cumulative profit across a
large number of customer segments. We consider a dynamic probit model with the
consumers' preferences as well as price sensitivity varying over time. Building
on the well-known finding that consumers sharing similar characteristics act in
similar ways, we consider a global shrinkage structure, which assumes that the
consumers' preferences across the different segments can be well approximated
by a spatial autoregressive (SAR) model. In such a streamed longitudinal
set-up, we measure the performance of a dynamic pricing policy via regret,
which is the expected revenue loss compared to a clairvoyant that knows the
sequence of model parameters in advance. We propose a pricing policy based on
penalized stochastic gradient descent (PSGD) and explicitly characterize its
regret as functions of time, the temporal variability in the model parameters
as well as the strength of the auto-correlation network structure spanning the
varied customer segments. Our regret analysis results not only demonstrate
asymptotic optimality of the proposed policy but also show that for policy
planning it is essential to incorporate available structural information as
policies based on unshrunken models are highly sub-optimal in the
aforementioned set-up.Comment: 34 pages, 5 figure
- …