12,105 research outputs found
Controlling Lipschitz functions
Given any positive integers and , we say the a sequence of points
in is {\em Lipschitz--controlling} if one can
select suitable values such that for every Lipschitz function
there exists with .
We conjecture that for every , a sequence is -controlling if and only if We prove that this condition is necessary and
a slightly stronger one is already sufficient for the sequence to be
-controlling. We also prove the conjecture for
Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant
We introduce a variational framework to learn the activation functions of
deep neural networks. Our aim is to increase the capacity of the network while
controlling an upper-bound of the actual Lipschitz constant of the input-output
relation. To that end, we first establish a global bound for the Lipschitz
constant of neural networks. Based on the obtained bound, we then formulate a
variational problem for learning activation functions. Our variational problem
is infinite-dimensional and is not computationally tractable. However, we prove
that there always exists a solution that has continuous and piecewise-linear
(linear-spline) activations. This reduces the original problem to a
finite-dimensional minimization where an l1 penalty on the parameters of the
activations favors the learning of sparse nonlinearities. We numerically
compare our scheme with standard ReLU network and its variations, PReLU and
LeakyReLU and we empirically demonstrate the practical aspects of our
framework
Concentration inequalities for random tensors
We show how to extend several basic concentration inequalities for simple
random tensors where all are
independent random vectors in with independent coefficients. The
new results have optimal dependence on the dimension and the degree . As
an application, we show that random tensors are well conditioned: independent copies of the simple random tensor
are far from being linearly dependent with high probability. We prove this fact
for any degree and conjecture that it is true for any
.Comment: A few more typos were correcte
Adaptive Momentum for Neural Network Optimization
In this thesis, we develop a novel and efficient algorithm for optimizing neural networks inspired by a recently proposed geodesic optimization algorithm. Our algorithm, which we call Stochastic Geodesic Optimization (SGeO), utilizes an adaptive coefficient on top of Polyaks Heavy Ball method effectively controlling the amount of weight put on the previous update to the parameters based on the change of direction in the optimization path. Experimental results on strongly convex functions with Lipschitz gradients and deep Autoencoder benchmarks show that SGeO reaches lower errors than established first-order methods and competes well with lower or similar errors to a recent second-order method called K-FAC (Kronecker-Factored Approximate Curvature). We also incorporate Nesterov style lookahead gradient into our algorithm (SGeO-N) and observe notable improvements. We believe that our research will open up new directions for high-dimensional neural network optimization where combining the efficiency of first-order methods and the effectiveness of second-order methods proves a promising avenue to explore
- …