Search CORE

2,694 research outputs found

An Approximate Bayesian Long Short-Term Memory Algorithm for Outlier Detection

Author: Chen Chao
Lin Xiao
Terejanu Gabriel
Publication venue
Publication date: 03/06/2019
Field of study

Long Short-Term Memory networks trained with gradient descent and back-propagation have received great success in various applications. However, point estimation of the weights of the networks is prone to over-fitting problems and lacks important uncertainty information associated with the estimation. However, exact Bayesian neural network methods are intractable and non-applicable for real-world applications. In this study, we propose an approximate estimation of the weights uncertainty using Ensemble Kalman Filter, which is easily scalable to a large number of weights. Furthermore, we optimize the covariance of the noise distribution in the ensemble update step using maximum likelihood estimation. To assess the proposed algorithm, we apply it to outlier detection in five real-world events retrieved from the Twitter platform

arXiv.org e-Print Archive

Crossref

Ensemble Kalman filter for neural network based one-shot inversion

Author: Guth Philipp A.
Schillings Claudia
Weissmann Simon
Publication venue
Publication date: 01/01/2020
Field of study

We study the use of novel techniques arising in machine learning for inverse problems. Our approach replaces the complex forward model by a neural network, which is trained simultaneously in a one-shot sense when estimating the unknown parameters from data, i.e. the neural network is trained only for the unknown parameter. By establishing a link to the Bayesian approach to inverse problems, an algorithmic framework is developed which ensures the feasibility of the parameter estimate w.r. to the forward model. We propose an efficient, derivative-free optimization method based on variants of the ensemble Kalman inversion. Numerical experiments show that the ensemble Kalman filter for neural network based one-shot inversion is a promising direction combining optimization and machine learning techniques for inverse problems

arXiv.org e-Print Archive

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Uncertainty Estimation of Deep Neural Networks

Author: Chen Chao
Publication venue: Scholar Commons
Publication date: 01/01/2018
Field of study

Normal neural networks trained with gradient descent and back-propagation have received great success in various applications. On one hand, point estimation of the network weights is prone to over-fitting problems and lacks important uncertainty information associated with the estimation. On the other hand, exact Bayesian neural network methods are intractable and non-applicable for real-world applications. To date, approximate methods have been actively under development for Bayesian neural networks, including but not limited to: stochastic variational methods, Monte Carlo dropouts, and expectation propagation. Though these methods are applicable for current large networks, there are limits to these approaches with either underestimation or over-estimation of uncertainty. Extended Kalman filters (EKFs) and unscented Kalman filters (UKFs), which are widely used in data assimilation community, adopt a different perspective of inferring the parameters. Nevertheless, EKFs are incapable of dealing with highly non-linearity, while UKFs are inapplicable for large network architectures. Ensemble Kalman filters (EnKFs) serve as great methodology in atmosphere and oceanology disciplines targeting extremely high-dimensional, non-Gaussian, and nonlinear state-space models. So far, there is little work that applies EnKFs to estimate the parameters of deep neural networks. By considering neural network as a nonlinear function, we augment the network prediction with parameters as new states and adapt the state-space model to update the parameters. In the first work, we describe the ensemble Kalman filter, two proposed training schemes for training both fully-connected and Long Short-term Memory (LSTM) networks, and experiment iv with 10 UCI datasets and a natural language dataset for different regression tasks. To further evaluate the effectiveness of the proposed training scheme, we trained a deep LSTM network with the proposed algorithm, and applied it on five realworld sub-event detection tasks. With a formalization of the sub-event detection task, we develop an outlier detection framework and take advantage of the Bayesian Long Short-term Memory (LSTM) network to capture the important and interesting moments within an event. In the last work, we propose a framework for student knowledge estimation using Bayesian network. By constructing student models with Bayesian network, we can infer the new state of knowledge on each concept given a student. With a novel parameter estimate algorithm, the model can also indicate misconception on each question. Furthermore, we develop a predictive validation metric with expected data likelihood of the student model to evaluate the design of questions

Scholar Commons - Institutional Repository of the University of South Carolina

A Matrix Ensemble Kalman Filter-based Multi-arm Neural Network to Adequately Approximate Deep Neural Networks

Author: Ghosh Souparno
Piyush Ved
Yan Yuchen
Yin Yanbin
Zhou Yuzhen
Publication venue
Publication date: 19/07/2023
Field of study

Deep Learners (DLs) are the state-of-art predictive mechanism with applications in many fields requiring complex high dimensional data processing. Although conventional DLs get trained via gradient descent with back-propagation, Kalman Filter (KF)-based techniques that do not need gradient computation have been developed to approximate DLs. We propose a multi-arm extension of a KF-based DL approximator that can mimic DL when the sample size is too small to train a multi-arm DL. The proposed Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN) also performs explicit model stacking that becomes relevant when the training sample has an unequal-size feature set. Our proposed technique can approximate Long Short-term Memory (LSTM) Networks and attach uncertainty to the predictions obtained from these LSTMs with desirable coverage. We demonstrate how MEnKF-ANN can "adequately" approximate an LSTM network trained to classify what carbohydrate substrates are digested and utilized by a microbiome sample whose genomic sequences consist of polysaccharide utilization loci (PULs) and their encoded genes.Comment: 18 pages, 6 Figures, and 6 Table

arXiv.org e-Print Archive