Search CORE

1,590 research outputs found

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Author: Bengio Yoshua
Gidel Gauthier
Goyette Kyle
Kerg Giancarlo
Lajoie Guillaume
Touzel Maximilian Puelma
Vorontsov Eugene
Publication venue
Publication date: 01/01/2019
Field of study

A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary. This ensures eigenvalues with unit norm and thus stable dynamics and training. However this comes at the cost of reduced expressivity due to the limited variety of orthogonal transformations. We propose a novel connectivity structure based on the Schur decomposition and a splitting of the Schur form into normal and non-normal parts. This allows to parametrize matrices with unit-norm eigenspectra without orthogonality constraints on eigenbases. The resulting architecture ensures access to a larger space of spectrally constrained matrices, of which orthogonal matrices are a subset. This crucial difference retains the stability advantages and training speed of orthogonal RNNs while enhancing expressivity, especially on tasks that require computations over ongoing input sequences

arXiv.org e-Print Archive

PolyPublie

Higher-order Quasi-Monte Carlo Training of Deep Neural Networks

Author: Longo M.
Mishra S.
Rusch T. K.
Schwab Ch.
Publication venue
Publication date: 06/09/2020
Field of study

We present a novel algorithmic approach and an error analysis leveraging Quasi-Monte Carlo points for training deep neural network (DNN) surrogates of Data-to-Observable (DtO) maps in engineering design. Our analysis reveals higher-order consistent, deterministic choices of training points in the input data space for deep and shallow Neural Networks with holomorphic activation functions such as tanh. These novel training points are proved to facilitate higher-order decay (in terms of the number of training samples) of the underlying generalization error, with consistency error bounds that are free from the curse of dimensionality in the input data space, provided that DNN weights in hidden layers satisfy certain summability conditions. We present numerical experiments for DtO maps from elliptic and parabolic PDEs with uncertain inputs that confirm the theoretical analysis

arXiv.org e-Print Archive

Repository for Publications and Research Data

SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers

Author: Albardan Mahmoud
Colot Olivier
Klein John
Publication venue: 'Elsevier BV'
Publication date: 20/02/2020
Field of study

We investigate a problem in which each member of a group of learners is trained separately to solve the same classification task. Each learner has access to a training dataset (possibly with overlap across learners) but each trained classifier can be evaluated on a validation dataset. We propose a new approach to aggregate the learner predictions in the possibility theory framework. For each classifier prediction, we build a possibility distribution assessing how likely the classifier prediction is correct using frequentist probabilities estimated on the validation set. The possibility distributions are aggregated using an adaptive t-norm that can accommodate dependency and poor accuracy of the classifier predictions. We prove that the proposed approach possesses a number of desirable classifier combination robustness properties

arXiv.org e-Print Archive

Hal-Diderot

Times series averaging from a probabilistic interpretation of time-elastic kernel

Author: Marteau Pierre-François
Publication venue
Publication date: 01/05/2015
Field of study

At the light of regularized dynamic time warping kernels, this paper reconsider the concept of time elastic centroid (TEC) for a set of time series. From this perspective, we show first how TEC can easily be addressed as a preimage problem. Unfortunately this preimage problem is ill-posed, may suffer from over-fitting especially for long time series and getting a sub-optimal solution involves heavy computational costs. We then derive two new algorithms based on a probabilistic interpretation of kernel alignment matrices that expresses in terms of probabilistic distributions over sets of alignment paths. The first algorithm is an iterative agglomerative heuristics inspired from the state of the art DTW barycenter averaging (DBA) algorithm proposed specifically for the Dynamic Time Warping measure. The second proposed algorithm achieves a classical averaging of the aligned samples but also implements an averaging of the time of occurrences of the aligned samples. It exploits a straightforward progressive agglomerative heuristics. An experimentation that compares for 45 time series datasets classification error rates obtained by first near neighbors classifiers exploiting a single medoid or centroid estimate to represent each categories show that: i) centroids based approaches significantly outperform medoids based approaches, ii) on the considered experience, the two proposed algorithms outperform the state of the art DBA algorithm, and iii) the second proposed algorithm that implements an averaging jointly in the sample space and along the time axes emerges as the most significantly robust time elastic averaging heuristic with an interesting noise reduction capability. Index Terms-Time series averaging Time elastic kernel Dynamic Time Warping Time series clustering and classification

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

A variational Bayesian method for inverse problems with impulsive noise

Author: Achim
Alliney
Attias
Bangti Jin
Beck
Chan
Chappell
Choi
Clason
Clason
Colli-Franzone
Dempster
Emery
Emery
Gao
Gelman
Geweke
Hebert
Inglese
Jin
Jin
Jin
Jin
Jordan
Kaipio
Kaipio
Koutsourelakis
Lange
Ma
Marzouk
Marzouk
Osman
Sambridge
Sato
Su
Tarantola
Tipping
Wang
Wang
Publication venue: 'Elsevier BV'
Publication date: 11/09/2011
Field of study

We propose a novel numerical method for solving inverse problems subject to impulsive noises which possibly contain a large number of outliers. The approach is of Bayesian type, and it exploits a heavy-tailed t distribution for data noise to achieve robustness with respect to outliers. A hierarchical model with all hyper-parameters automatically determined from the given data is described. An algorithm of variational type by minimizing the Kullback-Leibler divergence between the true posteriori distribution and a separable approximation is developed. The numerical method is illustrated on several one- and two-dimensional linear and nonlinear inverse problems arising from heat conduction, including estimating boundary temperature, heat flux and heat transfer coefficient. The results show its robustness to outliers and the fast and steady convergence of the algorithm.Comment: 20 pages, to appear in J. Comput. Phy

arXiv.org e-Print Archive

Crossref

UCL Discovery

Iterative Updating of Model Error for Bayesian Inversion

Author: Calvetti Daniela
Dunlop Matthew M.
Somersalo Erkki
Stuart Andrew M.
Publication venue: 'IOP Publishing'
Publication date: 13/07/2017
Field of study

In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when optimization algorithms are used to find a single estimate, or to speed up Markov chain Monte Carlo (MCMC) calculations in the Bayesian framework. The use of an approximate model introduces a discrepancy, or modeling error, that may have a detrimental effect on the solution of the ill-posed inverse problem, or it may severely distort the estimate of the posterior distribution. In the Bayesian paradigm, the modeling error can be considered as a random variable, and by using an estimate of the probability distribution of the unknown, one may estimate the probability distribution of the modeling error and incorporate it into the inversion. We introduce an algorithm which iterates this idea to update the distribution of the model error, leading to a sequence of posterior distributions that are demonstrated empirically to capture the underlying truth with increasing accuracy. Since the algorithm is not based on rejections, it requires only limited full model evaluations. We show analytically that, in the linear Gaussian case, the algorithm converges geometrically fast with respect to the number of iterations. For more general models, we introduce particle approximations of the iteratively generated sequence of distributions; we also prove that each element of the sequence converges in the large particle limit. We show numerically that, as in the linear case, rapid convergence occurs with respect to the number of iterations. Additionally, we show through computed examples that point estimates obtained from this iterative algorithm are superior to those obtained by neglecting the model error.Comment: 39 pages, 9 figure

arXiv.org e-Print Archive

Caltech Authors