350 research outputs found
Data Assimilation by Artificial Neural Networks for an Atmospheric General Circulation Model: Conventional Observation
This paper presents an approach for employing artificial neural networks (NN)
to emulate an ensemble Kalman filter (EnKF) as a method of data assimilation.
The assimilation methods are tested in the Simplified Parameterizations
PrimitivE-Equation Dynamics (SPEEDY) model, an atmospheric general circulation
model (AGCM), using synthetic observational data simulating localization of
balloon soundings. For the data assimilation scheme, the supervised NN, the
multilayer perceptrons (MLP-NN), is applied. The MLP-NN are able to emulate the
analysis from the local ensemble transform Kalman filter (LETKF). After the
training process, the method using the MLP-NN is seen as a function of data
assimilation. The NN were trained with data from first three months of 1982,
1983, and 1984. A hind-casting experiment for the 1985 data assimilation cycle
using MLP-NN were performed with synthetic observations for January 1985. The
numerical results demonstrate the effectiveness of the NN technique for
atmospheric data assimilation. The results of the NN analyses are very close to
the results from the LETKF analyses, the differences of the monthly average of
absolute temperature analyses is of order 0.02. The simulations show that the
major advantage of using the MLP-NN is better computational performance, since
the analyses have similar quality. The CPU-time cycle assimilation with MLP-NN
is 90 times faster than cycle assimilation with LETKF for the numerical
experiment.Comment: 17 pages, 16 figures, monthly weather revie
Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks
Since the recognition in the early nineties of the vanishing/exploding (V/E)
gradient issue plaguing the training of neural networks (NNs), significant
efforts have been exerted to overcome this obstacle. However, a clear solution
to the V/E issue remained elusive so far. In this manuscript a new architecture
of NN is proposed, designed to mathematically prevent the V/E issue to occur.
The pursuit of approximate dynamical isometry, i.e. parameter configurations
where the singular values of the input-output Jacobian are tightly distributed
around 1, leads to the derivation of a NN's architecture that shares common
traits with the popular Residual Network model. Instead of skipping connections
between layers, the idea is to filter the previous activations orthogonally and
add them to the nonlinear activations of the next layer, realising a convex
combination between them. Remarkably, the impossibility for the gradient
updates to either vanish or explode is demonstrated with analytical bounds that
hold even in the infinite depth case. The effectiveness of this method is
empirically proved by means of training via backpropagation an extremely deep
multilayer perceptron of 50k layers, and an Elman NN to learn long-term
dependencies in the input of 10k time steps in the past. Compared with other
architectures specifically devised to deal with the V/E problem, e.g. LSTMs for
recurrent NNs, the proposed model is way simpler yet more effective.
Surprisingly, a single layer vanilla RNN can be enhanced to reach state of the
art performance, while converging super fast; for instance on the psMNIST task,
it is possible to get test accuracy of over 94% in the first epoch, and over
98% after just 10 epochs
- …