1,590 research outputs found
Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics
A recent strategy to circumvent the exploding and vanishing gradient problem
in RNNs, and to allow the stable propagation of signals over long time scales,
is to constrain recurrent connectivity matrices to be orthogonal or unitary.
This ensures eigenvalues with unit norm and thus stable dynamics and training.
However this comes at the cost of reduced expressivity due to the limited
variety of orthogonal transformations. We propose a novel connectivity
structure based on the Schur decomposition and a splitting of the Schur form
into normal and non-normal parts. This allows to parametrize matrices with
unit-norm eigenspectra without orthogonality constraints on eigenbases. The
resulting architecture ensures access to a larger space of spectrally
constrained matrices, of which orthogonal matrices are a subset. This crucial
difference retains the stability advantages and training speed of orthogonal
RNNs while enhancing expressivity, especially on tasks that require
computations over ongoing input sequences
Higher-order Quasi-Monte Carlo Training of Deep Neural Networks
We present a novel algorithmic approach and an error analysis leveraging
Quasi-Monte Carlo points for training deep neural network (DNN) surrogates of
Data-to-Observable (DtO) maps in engineering design. Our analysis reveals
higher-order consistent, deterministic choices of training points in the input
data space for deep and shallow Neural Networks with holomorphic activation
functions such as tanh. These novel training points are proved to facilitate
higher-order decay (in terms of the number of training samples) of the
underlying generalization error, with consistency error bounds that are free
from the curse of dimensionality in the input data space, provided that DNN
weights in hidden layers satisfy certain summability conditions. We present
numerical experiments for DtO maps from elliptic and parabolic PDEs with
uncertain inputs that confirm the theoretical analysis
SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers
We investigate a problem in which each member of a group of learners is
trained separately to solve the same classification task. Each learner has
access to a training dataset (possibly with overlap across learners) but each
trained classifier can be evaluated on a validation dataset. We propose a new
approach to aggregate the learner predictions in the possibility theory
framework. For each classifier prediction, we build a possibility distribution
assessing how likely the classifier prediction is correct using frequentist
probabilities estimated on the validation set. The possibility distributions
are aggregated using an adaptive t-norm that can accommodate dependency and
poor accuracy of the classifier predictions. We prove that the proposed
approach possesses a number of desirable classifier combination robustness
properties
Times series averaging from a probabilistic interpretation of time-elastic kernel
At the light of regularized dynamic time warping kernels, this paper
reconsider the concept of time elastic centroid (TEC) for a set of time series.
From this perspective, we show first how TEC can easily be addressed as a
preimage problem. Unfortunately this preimage problem is ill-posed, may suffer
from over-fitting especially for long time series and getting a sub-optimal
solution involves heavy computational costs. We then derive two new algorithms
based on a probabilistic interpretation of kernel alignment matrices that
expresses in terms of probabilistic distributions over sets of alignment paths.
The first algorithm is an iterative agglomerative heuristics inspired from the
state of the art DTW barycenter averaging (DBA) algorithm proposed specifically
for the Dynamic Time Warping measure. The second proposed algorithm achieves a
classical averaging of the aligned samples but also implements an averaging of
the time of occurrences of the aligned samples. It exploits a straightforward
progressive agglomerative heuristics. An experimentation that compares for 45
time series datasets classification error rates obtained by first near
neighbors classifiers exploiting a single medoid or centroid estimate to
represent each categories show that: i) centroids based approaches
significantly outperform medoids based approaches, ii) on the considered
experience, the two proposed algorithms outperform the state of the art DBA
algorithm, and iii) the second proposed algorithm that implements an averaging
jointly in the sample space and along the time axes emerges as the most
significantly robust time elastic averaging heuristic with an interesting noise
reduction capability. Index Terms-Time series averaging Time elastic kernel
Dynamic Time Warping Time series clustering and classification
A variational Bayesian method for inverse problems with impulsive noise
We propose a novel numerical method for solving inverse problems subject to
impulsive noises which possibly contain a large number of outliers. The
approach is of Bayesian type, and it exploits a heavy-tailed t distribution for
data noise to achieve robustness with respect to outliers. A hierarchical model
with all hyper-parameters automatically determined from the given data is
described. An algorithm of variational type by minimizing the Kullback-Leibler
divergence between the true posteriori distribution and a separable
approximation is developed. The numerical method is illustrated on several one-
and two-dimensional linear and nonlinear inverse problems arising from heat
conduction, including estimating boundary temperature, heat flux and heat
transfer coefficient. The results show its robustness to outliers and the fast
and steady convergence of the algorithm.Comment: 20 pages, to appear in J. Comput. Phy
Iterative Updating of Model Error for Bayesian Inversion
In computational inverse problems, it is common that a detailed and accurate
forward model is approximated by a computationally less challenging substitute.
The model reduction may be necessary to meet constraints in computing time when
optimization algorithms are used to find a single estimate, or to speed up
Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use
of an approximate model introduces a discrepancy, or modeling error, that may
have a detrimental effect on the solution of the ill-posed inverse problem, or
it may severely distort the estimate of the posterior distribution. In the
Bayesian paradigm, the modeling error can be considered as a random variable,
and by using an estimate of the probability distribution of the unknown, one
may estimate the probability distribution of the modeling error and incorporate
it into the inversion. We introduce an algorithm which iterates this idea to
update the distribution of the model error, leading to a sequence of posterior
distributions that are demonstrated empirically to capture the underlying truth
with increasing accuracy. Since the algorithm is not based on rejections, it
requires only limited full model evaluations.
We show analytically that, in the linear Gaussian case, the algorithm
converges geometrically fast with respect to the number of iterations. For more
general models, we introduce particle approximations of the iteratively
generated sequence of distributions; we also prove that each element of the
sequence converges in the large particle limit. We show numerically that, as in
the linear case, rapid convergence occurs with respect to the number of
iterations. Additionally, we show through computed examples that point
estimates obtained from this iterative algorithm are superior to those obtained
by neglecting the model error.Comment: 39 pages, 9 figure
- …