12,105 research outputs found

    Controlling Lipschitz functions

    Get PDF
    Given any positive integers mm and dd, we say the a sequence of points (xi)iI(x_i)_{i\in I} in Rm\mathbb R^m is {\em Lipschitz-dd-controlling} if one can select suitable values yi  (iI)y_i\; (i\in I) such that for every Lipschitz function f:RmRdf:\mathbb R^m\rightarrow \mathbb R^d there exists ii with f(xi)yi<1|f(x_i)-y_i|<1. We conjecture that for every mdm\le d, a sequence (xi)iIRm(x_i)_{i\in I}\subset\mathbb R^m is dd-controlling if and only if supnN{iI:xin}nd=.\sup_{n\in\mathbb N}\frac{|\{i\in I\, :\, |x_i|\le n\}|}{n^d}=\infty. We prove that this condition is necessary and a slightly stronger one is already sufficient for the sequence to be dd-controlling. We also prove the conjecture for m=1m=1

    Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant

    Full text link
    We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational problem for learning activation functions. Our variational problem is infinite-dimensional and is not computationally tractable. However, we prove that there always exists a solution that has continuous and piecewise-linear (linear-spline) activations. This reduces the original problem to a finite-dimensional minimization where an l1 penalty on the parameters of the activations favors the learning of sparse nonlinearities. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU and we empirically demonstrate the practical aspects of our framework

    Concentration inequalities for random tensors

    Get PDF
    We show how to extend several basic concentration inequalities for simple random tensors X=x1xdX = x_1 \otimes \cdots \otimes x_d where all xkx_k are independent random vectors in Rn\mathbb{R}^n with independent coefficients. The new results have optimal dependence on the dimension nn and the degree dd. As an application, we show that random tensors are well conditioned: (1o(1))nd(1-o(1)) n^d independent copies of the simple random tensor XRndX \in \mathbb{R}^{n^d} are far from being linearly dependent with high probability. We prove this fact for any degree d=o(n/logn)d = o(\sqrt{n/\log n}) and conjecture that it is true for any d=O(n)d = O(n).Comment: A few more typos were correcte

    Adaptive Momentum for Neural Network Optimization

    Get PDF
    In this thesis, we develop a novel and efficient algorithm for optimizing neural networks inspired by a recently proposed geodesic optimization algorithm. Our algorithm, which we call Stochastic Geodesic Optimization (SGeO), utilizes an adaptive coefficient on top of Polyaks Heavy Ball method effectively controlling the amount of weight put on the previous update to the parameters based on the change of direction in the optimization path. Experimental results on strongly convex functions with Lipschitz gradients and deep Autoencoder benchmarks show that SGeO reaches lower errors than established first-order methods and competes well with lower or similar errors to a recent second-order method called K-FAC (Kronecker-Factored Approximate Curvature). We also incorporate Nesterov style lookahead gradient into our algorithm (SGeO-N) and observe notable improvements. We believe that our research will open up new directions for high-dimensional neural network optimization where combining the efficiency of first-order methods and the effectiveness of second-order methods proves a promising avenue to explore
    corecore