1,354 research outputs found
Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function
We demonstrate that in residual neural networks (ResNets) dynamical isometry
is achievable irrespectively of the activation function used. We do that by
deriving, with the help of Free Probability and Random Matrix Theories, a
universal formula for the spectral density of the input-output Jacobian at
initialization, in the large network width and depth limit. The resulting
singular value spectrum depends on a single parameter, which we calculate for a
variety of popular activation functions, by analyzing the signal propagation in
the artificial neural network. We corroborate our results with numerical
simulations of both random matrices and ResNets applied to the CIFAR-10
classification problem. Moreover, we study the consequence of this universal
behavior for the initial and late phases of the learning processes. We conclude
by drawing attention to the simple fact, that initialization acts as a
confounding factor between the choice of activation function and the rate of
learning. We propose that in ResNets this can be resolved based on our results,
by ensuring the same level of dynamical isometry at initialization
A Geometric Approach of Gradient Descent Algorithms in Neural Networks
In this paper, we present an original geometric framework to analyze the
convergence properties of gradient descent trajectories in the context of
linear neural networks. Built upon a key invariance property induced by the
network structure, we propose a conjecture called \emph{overfitting conjecture}
stating that, for almost every training data, the corresponding gradient
descent trajectory converges to a global minimum, for almost every initial
condition. This would imply that, for linear neural networks of an arbitrary
number of hidden layers, the solution achieved by simple gradient descent
algorithm is equivalent to that of least square estimation. Our first result
consists in establishing, in the case of linear networks of arbitrary depth,
convergence of gradient descent trajectories to critical points of the loss
function. Our second result is the proof of the \emph{overfitting conjecture}
in the case of single-hidden-layer linear networks with an argument based on
the notion of normal hyperbolicity and under a generic property on the training
data (i.e., holding for almost every training data).Comment: Preprint. Work in progres
Low-latency compression of mocap data using learned spatial decorrelation transform
Due to the growing needs of human motion capture (mocap) in movie, video
games, sports, etc., it is highly desired to compress mocap data for efficient
storage and transmission. This paper presents two efficient frameworks for
compressing human mocap data with low latency. The first framework processes
the data in a frame-by-frame manner so that it is ideal for mocap data
streaming and time critical applications. The second one is clip-based and
provides a flexible tradeoff between latency and compression performance. Since
mocap data exhibits some unique spatial characteristics, we propose a very
effective transform, namely learned orthogonal transform (LOT), for reducing
the spatial redundancy. The LOT problem is formulated as minimizing square
error regularized by orthogonality and sparsity and solved via alternating
iteration. We also adopt a predictive coding and temporal DCT for temporal
decorrelation in the frame- and clip-based frameworks, respectively.
Experimental results show that the proposed frameworks can produce higher
compression performance at lower computational cost and latency than the
state-of-the-art methods.Comment: 15 pages, 9 figure
Convolutional Neural Network Architecture Study for Aerial Visual Localization
In unmanned aerial navigation the ability to determine the aircraft\u27s location is essential for safe flight. The Global Positioning System (GPS) is the default modern application used for geospatial location determination. GPS is extremely robust, very accurate, and has essentially solved aerial localization. Unfortunately, the signals from all Global Navigation Satellite Systems (GNSS) to include GPS can be jammed or spoofed. To this response it is essential to develop alternative systems that could be used to supplement navigation systems, in the event of a lost GNSS signal. Public and governmental satellites have provided large amounts of high-resolution satellite imagery. These could be exploited through machine learning to aid onboard navigation equipment to provide a geospatial location solution. Deep learning and Convolutional Neural Networks (CNNs) have provided significant advances in specific image processing algorithms. This thesis will discuss the performance of CNN architectures with various hyperparameters and industry leading model designs to address visual aerial localization. The localization algorithm is trained and tested through satellite imagery of a localized area of 150 square kilometers. Three hyper-parameters of focus are: initializations, optimizers, and finishing layers. The five model architectures are: MobileNet V2, Inception V3, ResNet 50, Xception, and DenseNet 201. The hyper-parameter analysis demonstrates that specific initializations, optimizations and finishing layers can have significant effects on the training of a CNN architecture for this specific task. The lessons learned from the hyper-parameter analysis were implemented into the CNN comparison study. After all the models were trained for 150 epochs they were evaluated on the test set. The Xception model with pretrained initialization outperformed all other models with a Root Mean Squared (RMS) error of only 85 meters
Achieving High Accuracy with PINNs via Energy Natural Gradients
We propose energy natural gradient descent, a natural gradient method with
respect to a Hessian-induced Riemannian metric as an optimization algorithm for
physics-informed neural networks (PINNs) and the deep Ritz method. As a main
motivation we show that the update direction in function space resulting from
the energy natural gradient corresponds to the Newton direction modulo an
orthogonal projection onto the model's tangent space. We demonstrate
experimentally that energy natural gradient descent yields highly accurate
solutions with errors several orders of magnitude smaller than what is obtained
when training PINNs with standard optimizers like gradient descent or Adam,
even when those are allowed significantly more computation time.Comment: Published versio
- …