3 research outputs found
Generalized Batch Normalization: Towards Accelerating Deep Neural Networks
Utilizing recently introduced concepts from statistics and quantitative risk
management, we present a general variant of Batch Normalization (BN) that
offers accelerated convergence of Neural Network training compared to
conventional BN. In general, we show that mean and standard deviation are not
always the most appropriate choice for the centering and scaling procedure
within the BN transformation, particularly if ReLU follows the normalization
step. We present a Generalized Batch Normalization (GBN) transformation, which
can utilize a variety of alternative deviation measures for scaling and
statistics for centering, choices which naturally arise from the theory of
generalized deviation measures and risk theory in general. When used in
conjunction with the ReLU non-linearity, the underlying risk theory suggests
natural, arguably optimal choices for the deviation measure and statistic.
Utilizing the suggested deviation measure and statistic, we show experimentally
that training is accelerated more so than with conventional BN, often with
improved error rate as well. Overall, we propose a more flexible BN
transformation supported by a complimentary theoretical framework that can
potentially guide design choices.Comment: accepted at AAAI-1
Optimistic Robust Optimization With Applications To Machine Learning
Robust Optimization has traditionally taken a pessimistic, or worst-case
viewpoint of uncertainty which is motivated by a desire to find sets of optimal
policies that maintain feasibility under a variety of operating conditions. In
this paper, we explore an optimistic, or best-case view of uncertainty and show
that it can be a fruitful approach. We show that these techniques can be used
to address a wide variety of problems. First, we apply our methods in the
context of robust linear programming, providing a method for reducing
conservatism in intuitive ways that encode economically realistic modeling
assumptions. Second, we look at problems in machine learning and find that this
approach is strongly connected to the existing literature. Specifically, we
provide a new interpretation for popular sparsity inducing non-convex
regularization schemes. Additionally, we show that successful approaches for
dealing with outliers and noise can be interpreted as optimistic robust
optimization problems. Although many of the problems resulting from our
approach are non-convex, we find that DCA or DCA-like optimization approaches
can be intuitive and efficient
Calculating CVaR and bPOE for Common Probability Distributions With Application to Portfolio Optimization and Density Estimation
Conditional Value-at-Risk (CVaR) and Value-at-Risk (VaR), also called the
superquantile and quantile, are frequently used to characterize the tails of
probability distribution's and are popular measures of risk. Buffered
Probability of Exceedance (bPOE) is a recently introduced characterization of
the tail which is the inverse of CVaR, much like the CDF is the inverse of the
quantile. These quantities can prove very useful as the basis for a variety of
risk-averse parametric engineering approaches. Their use, however, is often
made difficult by the lack of well-known closed-form equations for calculating
these quantities for commonly used probability distribution's. In this paper,
we derive formulas for the superquantile and bPOE for a variety of common
univariate probability distribution's. Besides providing a useful collection
within a single reference, we use these formulas to incorporate the
superquantile and bPOE into parametric procedures. In particular, we consider
two: portfolio optimization and density estimation. First, when portfolio
returns are assumed to follow particular distribution families, we show that
finding the optimal portfolio via minimization of bPOE has advantages over
superquantile minimization. We show that, given a fixed threshold, a single
portfolio is the minimal bPOE portfolio for an entire class of distribution's
simultaneously. Second, we apply our formulas to parametric density estimation
and propose the method of superquantile's (MOS), a simple variation of the
method of moment's (MM) where moment's are replaced by superquantile's at
different confidence levels. With the freedom to select various combinations of
confidence levels, MOS allows the user to focus the fitting procedure on
different portions of the distribution, such as the tail when fitting
heavy-tailed asymmetric data.Comment: Fixed typo in Proposition 5 (changed - to +) and added referenc