392 research outputs found

    SWAP: Sparse Entropic Wasserstein Regression for Robust Network Pruning

    Full text link
    This study addresses the challenge of inaccurate gradients in computing the empirical Fisher Information Matrix during neural network pruning. We introduce SWAP, a formulation of Entropic Wasserstein regression (EWR) for pruning, capitalizing on the geometric properties of the optimal transport problem. The ``swap'' of the commonly used linear regression with the EWR in optimization is analytically demonstrated to offer noise mitigation effects by incorporating neighborhood interpolation across data points with only marginal additional computational cost. The unique strength of SWAP is its intrinsic ability to balance noise reduction and covariance information preservation effectively. Extensive experiments performed on various networks and datasets show comparable performance of SWAP with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.Comment: Published as a conference paper at ICLR 202

    Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

    Get PDF
    Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.Comment: 22 pages, 7 Figures, 1 Tabl

    Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

    Full text link
    We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage stochastic combinatorial optimization problems from the operations research domain. In this context, risk-sensitive algorithms promise to learn robust policies. While this field is of general interest to the reinforcement learning community, most studies up-to-date focus on theoretical results rather than real-world performance. With this work, we aim to bridge this gap by formally deriving a novel risk-sensitive deep reinforcement learning algorithm while providing numerical evidence for its efficacy. Specifically, we introduce discrete Soft Actor-Critic for the entropic risk measure by deriving a version of the Bellman equation for the respective Q-values. We establish a corresponding policy improvement result and infer a practical algorithm. We introduce an environment that represents typical contextual multi-stage stochastic combinatorial optimization problems and perform numerical experiments to empirically validate our algorithm's robustness against realistic distribution shifts, without compromising performance on the training distribution. We show that our algorithm is superior to risk-neutral Soft Actor-Critic as well as to two benchmark approaches for robust deep reinforcement learning. Thereby, we provide the first structured analysis on the robustness of reinforcement learning under distribution shifts in the realm of contextual multi-stage stochastic combinatorial optimization problems.Comment: 11 pages, 8 figure

    LEARNING TO ACT WITH ROBUSTNESS

    Get PDF
    Reinforcement Learning (RL) is learning to act in different situations to maximize a numerical reward signal. The most common approach of formalizing RL is to use the frameworkof optimal control in an inadequately known Markov Decision Process (MDP). Traditional approaches toward solving RL problems build on two common assumptions: i) exploration is allowed for the purpose of learning the MDP model and ii) optimizing for the expected objective is sufficient. These assumptions comfortably hold for many simulated domains like games (e.g. Atari, Go), but are not sufficient for many real-world problems. Consider for example the domain of precision medicine for personalized treatment. Adopting a medical treatment for the sole purpose of learning its impact is prohibitive. It is also not permissible to embrace a specific treatment procedure by considering only the expected outcome, ignoring the potential of worst-case undesirable effects. Therefore, applying RL to solve real-world problems brings some additional challenges to address. In this thesis, we assume that exploration is impossible because of the sensitivity of actions in the domain. We therefore adopt a Batch RL framework, which operates with a logged set of fixed dataset without interacting with the environment. We also accept the need of finding solutions that work well in both average and worst case situations, we label such solutions as robust. We consider the robust MDP (RMDP) framework for handling these challenges. RMDPs provide the foundations of quantifying the uncertainties about the model by using so called ambiguity sets. Ambiguity sets represent the set of plausible transition probabilities - which is usually constructed as a multi-dimensional confidence region. Ambiguity sets determine the trade-off between robustness and average-case performance of an RMDP. This thesis presents a novel approach to optimizing the shape of ambiguity sets constructed with weighted L1−norm. We derive new high-confidence sampling bounds for weighted L1 ambiguity sets and describe how to compute near-optimal weights from coarse estimates of value functions. Experimental results on a diverse set of benchmarks show that optimized ambiguity sets provide significantly tighter robustness guarantees. In addition to reshaping the ambiguity sets, it is also desirable to optimize the size and position of the sets for further improvement in performance. In this regard, this thesis presents a method for constructing ambiguity sets that can achieve less conservative solutions with the same worst-case guarantees by 1) leveraging a Bayesian prior, and 2) relaxing the requirement that the set is a confidence interval. Our theoretical analysis establishes the safety of the proposed method, and the empirical results demonstrate its practical promise. In addition to optimizing ambiguity sets for RMDPs, this thesis also proposes a new paradigm for incorporating robustness into the constrained-MDP framework. We apply robustness to both the rewards and constrained-costs, because robustness is equally (if not more) important for the constrained costs as well. We derive required gradient update rules and propose a policy gradient class of algorithm. The performance of the proposed algorithm is evaluated on several problem domains. Parallel to Robust-MDPs, a slightly different perspective on handling model uncertainties is to compute soft-robust solutions using a risk measure (e.g. Value-at-Risk or Conditional Value-at-Risk). In high-stakes domains, it is important to quantify and manage risk that arises from inherently stochastic transitions between different states of the model. Most prior work on robust RL and risk-averse RL address the inherent transition uncertainty and model uncertainty independently. This thesis proposes a unified Risk-Averse Soft-Robust (RASR) framework that quantifies both model and transition uncertainties together. We show that the RASR objective can be solved efficiently when formulated using the Entropic risk measure. We also report theoretical analysis and empirical evidences on several problem domains. The methods presented in this thesis can potentially be applied in many practical applications of artificial intelligence, such as agriculture, healthcare, robotics and so on. They help us to broaden our understanding toward computing robust solutions to safety critical domains. Having robust and more realistic solutions to sensitive practical problems can inspire widespread adoption of AI to solve challenging real world problems, potentially leading toward the pinnacle of the age of automation

    Enhancing Deep Neural Networks Testing by Traversing Data Manifold

    Full text link
    We develop DEEPTRAVERSAL, a feedback-driven framework to test DNNs. DEEPTRAVERSAL first launches an offline phase to map media data of various forms to manifolds. Then, in its online testing phase, DEEPTRAVERSAL traverses the prepared manifold space to maximize DNN coverage criteria and trigger prediction errors. In our evaluation, DNNs executing various tasks (e.g., classification, self-driving, machine translation) and media data of different types (image, audio, text) were used. DEEPTRAVERSAL exhibits better performance than prior methods with respect to popular DNN coverage criteria and it can discover a larger number and higher quality of error-triggering inputs. The tested DNN models, after being repaired with findings of DEEPTRAVERSAL, achieve better accurac

    Missing Data Imputation using Optimal Transport

    Full text link
    Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize these losses using end-to-end learning, that can exploit or not parametric assumptions on the underlying distributions of values. We evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR settings. These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values

    Time-series Generation by Contrastive Imitation

    Full text link
    Consider learning a generative model for time-series data. The sequential setting poses a unique challenge: Not only should the generator capture the conditional dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the joint distribution of (multi-step) trajectories. On one hand, autoregressive models trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts. On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess. In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy, where the reinforcement signal is provided by a global (but stepwise-decomposable) energy model trained by contrastive estimation. At training, the two components are learned cooperatively, avoiding the instabilities typical of adversarial objectives. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality. By expressly training a policy to imitate sequential behavior of time-series features in a dataset, this approach embodies "generation by imitation". Theoretically, we illustrate the correctness of this formulation and the consistency of the algorithm. Empirically, we evaluate its ability to generate predictively useful samples from real-world datasets, verifying that it performs at the standard of existing benchmarks

    Deep Learning And Uncertainty Quantification: Methodologies And Applications

    Get PDF
    Uncertainty quantification is a recent emerging interdisciplinary area that leverages the power of statistical methods, machine learning models, numerical methods and data-driven approach to provide reliable inference for quantities of interest in natural science and engineering problems. In practice, the sources of uncertainty come from different aspects such as: aleatoric uncertainty where the uncertainty comes from the observations or is due to the stochastic nature of the problem; epistemic uncertainty where the uncertainty comes from inaccurate mathematical models, computational methods or model parametrization. Cope with the above different types of uncertainty, a successful and scalable model for uncertainty quantification requires prior knowledge in the problem, careful design of mathematical models, cautious selection of computational tools, etc. The fast growth in deep learning, probabilistic methods and the large volume of data available across different research areas enable researchers to take advantage of these recent advances to propose novel methodologies to solve scientific problems where uncertainty quantification plays important roles. The objective of this dissertation is to address the existing gaps and propose new methodologies for uncertainty quantification with deep learning methods and demonstrate their power in engineering applications. On the methodology side, we first present a generative adversarial framework to model aleatoric uncertainty in stochastic systems. Secondly, we leverage the proposed generative model with recent advances in physics-informed deep learning to learn the uncertainty propagation in solutions of partial differential equations. Thirdly, we introduce a simple and effective approach for posterior uncertainty quantification for learning nonlinear operators. Fourthly, we consider inverse problems of physical systems on identifying unknown forms and parameters in dynamical systems via observed noisy data. On the application side, we first propose an importance sampling approach for sequential decision making. Second, we propose a physics-informed neural network method to quantify the epistemic uncertainty in cardiac activation mapping modeling and conduct active learning. Third, we present an anto-encoder based framework for data augmentation and generation for data that is expensive to obtain such as single-cell RNA sequencing
    corecore