392 research outputs found
SWAP: Sparse Entropic Wasserstein Regression for Robust Network Pruning
This study addresses the challenge of inaccurate gradients in computing the
empirical Fisher Information Matrix during neural network pruning. We introduce
SWAP, a formulation of Entropic Wasserstein regression (EWR) for pruning,
capitalizing on the geometric properties of the optimal transport problem. The
``swap'' of the commonly used linear regression with the EWR in optimization is
analytically demonstrated to offer noise mitigation effects by incorporating
neighborhood interpolation across data points with only marginal additional
computational cost. The unique strength of SWAP is its intrinsic ability to
balance noise reduction and covariance information preservation effectively.
Extensive experiments performed on various networks and datasets show
comparable performance of SWAP with state-of-the-art (SoTA) network pruning
algorithms. Our proposed method outperforms the SoTA when the network size or
the target sparsity is large, the gain is even larger with the existence of
noisy gradients, possibly from noisy data, analog memory, or adversarial
attacks. Notably, our proposed method achieves a gain of 6% improvement in
accuracy and 8% improvement in testing loss for MobileNetV1 with less than
one-fourth of the network parameters remaining.Comment: Published as a conference paper at ICLR 202
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference
Efficient machine learning implementations optimized for inference in
hardware have wide-ranging benefits, depending on the application, from lower
inference latency to higher data throughput and reduced energy consumption. Two
popular techniques for reducing computation in neural networks are pruning,
removing insignificant synapses, and quantization, reducing the precision of
the calculations. In this work, we explore the interplay between pruning and
quantization during the training of neural networks for ultra low latency
applications targeting high energy physics use cases. Techniques developed for
this study have potential applications across many other domains. We study
various configurations of pruning during quantization-aware training, which we
term quantization-aware pruning, and the effect of techniques like
regularization, batch normalization, and different pruning schemes on
performance, computational complexity, and information content metrics. We find
that quantization-aware pruning yields more computationally efficient models
than either pruning or quantization alone for our task. Further,
quantization-aware pruning typically performs similar to or better in terms of
computational efficiency compared to other neural architecture search
techniques like Bayesian optimization. Surprisingly, while networks with
different training configurations can have similar performance for the
benchmark application, the information content in the network can vary
significantly, affecting its generalizability.Comment: 22 pages, 7 Figures, 1 Tabl
Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts
We study the robustness of deep reinforcement learning algorithms against
distribution shifts within contextual multi-stage stochastic combinatorial
optimization problems from the operations research domain. In this context,
risk-sensitive algorithms promise to learn robust policies. While this field is
of general interest to the reinforcement learning community, most studies
up-to-date focus on theoretical results rather than real-world performance.
With this work, we aim to bridge this gap by formally deriving a novel
risk-sensitive deep reinforcement learning algorithm while providing numerical
evidence for its efficacy. Specifically, we introduce discrete Soft
Actor-Critic for the entropic risk measure by deriving a version of the Bellman
equation for the respective Q-values. We establish a corresponding policy
improvement result and infer a practical algorithm. We introduce an environment
that represents typical contextual multi-stage stochastic combinatorial
optimization problems and perform numerical experiments to empirically validate
our algorithm's robustness against realistic distribution shifts, without
compromising performance on the training distribution. We show that our
algorithm is superior to risk-neutral Soft Actor-Critic as well as to two
benchmark approaches for robust deep reinforcement learning. Thereby, we
provide the first structured analysis on the robustness of reinforcement
learning under distribution shifts in the realm of contextual multi-stage
stochastic combinatorial optimization problems.Comment: 11 pages, 8 figure
LEARNING TO ACT WITH ROBUSTNESS
Reinforcement Learning (RL) is learning to act in different situations to maximize a numerical reward signal. The most common approach of formalizing RL is to use the frameworkof optimal control in an inadequately known Markov Decision Process (MDP). Traditional approaches toward solving RL problems build on two common assumptions: i) exploration is allowed for the purpose of learning the MDP model and ii) optimizing for the expected objective is sufficient. These assumptions comfortably hold for many simulated domains like games (e.g. Atari, Go), but are not sufficient for many real-world problems. Consider for example the domain of precision medicine for personalized treatment. Adopting a medical treatment for the sole purpose of learning its impact is prohibitive. It is also not permissible to embrace a specific treatment procedure by considering only the expected outcome, ignoring the potential of worst-case undesirable effects. Therefore, applying RL to solve real-world problems brings some additional challenges to address. In this thesis, we assume that exploration is impossible because of the sensitivity of actions in the domain. We therefore adopt a Batch RL framework, which operates with a logged set of fixed dataset without interacting with the environment. We also accept the need of finding solutions that work well in both average and worst case situations, we label such solutions as robust. We consider the robust MDP (RMDP) framework for handling these challenges. RMDPs provide the foundations of quantifying the uncertainties about the model by using so called ambiguity sets. Ambiguity sets represent the set of plausible transition probabilities - which is usually constructed as a multi-dimensional confidence region. Ambiguity sets determine the trade-off between robustness and average-case performance of an RMDP. This thesis presents a novel approach to optimizing the shape of ambiguity sets constructed with weighted L1−norm. We derive new high-confidence sampling bounds for weighted L1 ambiguity sets and describe how to compute near-optimal weights from coarse estimates of value functions. Experimental results on a diverse set of benchmarks show that optimized ambiguity sets provide significantly tighter robustness guarantees. In addition to reshaping the ambiguity sets, it is also desirable to optimize the size and position of the sets for further improvement in performance. In this regard, this thesis presents a method for constructing ambiguity sets that can achieve less conservative solutions with the same worst-case guarantees by 1) leveraging a Bayesian prior, and 2) relaxing the requirement that the set is a confidence interval. Our theoretical analysis establishes the safety of the proposed method, and the empirical results demonstrate its practical promise. In addition to optimizing ambiguity sets for RMDPs, this thesis also proposes a new paradigm for incorporating robustness into the constrained-MDP framework. We apply robustness to both the rewards and constrained-costs, because robustness is equally (if not more) important for the constrained costs as well. We derive required gradient update rules and propose a policy gradient class of algorithm. The performance of the proposed algorithm is evaluated on several problem domains. Parallel to Robust-MDPs, a slightly different perspective on handling model uncertainties is to compute soft-robust solutions using a risk measure (e.g. Value-at-Risk or Conditional Value-at-Risk). In high-stakes domains, it is important to quantify and manage risk that arises from inherently stochastic transitions between different states of the model. Most prior work on robust RL and risk-averse RL address the inherent transition uncertainty and model uncertainty independently. This thesis proposes a unified Risk-Averse Soft-Robust (RASR) framework that quantifies both model and transition uncertainties together. We show that the RASR objective can be solved efficiently when formulated using the Entropic risk measure. We also report theoretical analysis and empirical evidences on several problem domains. The methods presented in this thesis can potentially be applied in many practical applications of artificial intelligence, such as agriculture, healthcare, robotics and so on. They help us to broaden our understanding toward computing robust solutions to safety critical domains. Having robust and more realistic solutions to sensitive practical problems can inspire widespread adoption of AI to solve challenging real world problems, potentially leading toward the pinnacle of the age of automation
Enhancing Deep Neural Networks Testing by Traversing Data Manifold
We develop DEEPTRAVERSAL, a feedback-driven framework to test DNNs.
DEEPTRAVERSAL first launches an offline phase to map media data of various
forms to manifolds. Then, in its online testing phase, DEEPTRAVERSAL traverses
the prepared manifold space to maximize DNN coverage criteria and trigger
prediction errors. In our evaluation, DNNs executing various tasks (e.g.,
classification, self-driving, machine translation) and media data of different
types (image, audio, text) were used. DEEPTRAVERSAL exhibits better performance
than prior methods with respect to popular DNN coverage criteria and it can
discover a larger number and higher quality of error-triggering inputs. The
tested DNN models, after being repaired with findings of DEEPTRAVERSAL, achieve
better accurac
Missing Data Imputation using Optimal Transport
Missing data is a crucial issue when applying machine learning algorithms to
real-world datasets. Starting from the simple assumption that two batches
extracted randomly from the same dataset should share the same distribution, we
leverage optimal transport distances to quantify that criterion and turn it
into a loss function to impute missing data values. We propose practical
methods to minimize these losses using end-to-end learning, that can exploit or
not parametric assumptions on the underlying distributions of values. We
evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR
settings. These experiments show that OT-based methods match or out-perform
state-of-the-art imputation methods, even for high percentages of missing
values
Time-series Generation by Contrastive Imitation
Consider learning a generative model for time-series data. The sequential
setting poses a unique challenge: Not only should the generator capture the
conditional dynamics of (stepwise) transitions, but its open-loop rollouts
should also preserve the joint distribution of (multi-step) trajectories. On
one hand, autoregressive models trained by MLE allow learning and computing
explicit transition distributions, but suffer from compounding error during
rollouts. On the other hand, adversarial models based on GAN training alleviate
such exposure bias, but transitions are implicit and hard to assess. In this
work, we study a generative framework that seeks to combine the strengths of
both: Motivated by a moment-matching objective to mitigate compounding error,
we optimize a local (but forward-looking) transition policy, where the
reinforcement signal is provided by a global (but stepwise-decomposable) energy
model trained by contrastive estimation. At training, the two components are
learned cooperatively, avoiding the instabilities typical of adversarial
objectives. At inference, the learned policy serves as the generator for
iterative sampling, and the learned energy serves as a trajectory-level measure
for evaluating sample quality. By expressly training a policy to imitate
sequential behavior of time-series features in a dataset, this approach
embodies "generation by imitation". Theoretically, we illustrate the
correctness of this formulation and the consistency of the algorithm.
Empirically, we evaluate its ability to generate predictively useful samples
from real-world datasets, verifying that it performs at the standard of
existing benchmarks
Deep Learning And Uncertainty Quantification: Methodologies And Applications
Uncertainty quantification is a recent emerging interdisciplinary area that leverages the power of statistical methods, machine learning models, numerical methods and data-driven approach to provide reliable inference for quantities of interest in natural science and engineering problems. In practice, the sources of uncertainty come from different aspects such as: aleatoric uncertainty where the uncertainty comes from the observations or is due to the stochastic nature of the problem; epistemic uncertainty where the uncertainty comes from inaccurate mathematical models, computational methods or model parametrization. Cope with the above different types of uncertainty, a successful and scalable model for uncertainty quantification requires prior knowledge in the problem, careful design of mathematical models, cautious selection of computational tools, etc. The fast growth in deep learning, probabilistic methods and the large volume of data available across different research areas enable researchers to take advantage of these recent advances to propose novel methodologies to solve scientific problems where uncertainty quantification plays important roles. The objective of this dissertation is to address the existing gaps and propose new methodologies for uncertainty quantification with deep learning methods and demonstrate their power in engineering applications.
On the methodology side, we first present a generative adversarial framework to model aleatoric uncertainty in stochastic systems. Secondly, we leverage the proposed generative model with recent advances in physics-informed deep learning to learn the uncertainty propagation in solutions of partial differential equations. Thirdly, we introduce a simple and effective approach for posterior uncertainty quantification for learning nonlinear operators. Fourthly, we consider inverse problems of physical systems on identifying unknown forms and parameters in dynamical systems via observed noisy data.
On the application side, we first propose an importance sampling approach for sequential decision making. Second, we propose a physics-informed neural network method to quantify the epistemic uncertainty in cardiac activation mapping modeling and conduct active learning. Third, we present an anto-encoder based framework for data augmentation and generation for data that is expensive to obtain such as single-cell RNA sequencing
- …