152 research outputs found

    Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

    Full text link
    Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.Comment: AAAI 201

    Neural Networks: Training and Application to Nonlinear System Identification and Control

    Get PDF
    This dissertation investigates training neural networks for system identification and classification. The research contains two main contributions as follow:1. Reducing number of hidden layer nodes using a feedforward componentThis research reduces the number of hidden layer nodes and training time of neural networks to make them more suited to online identification and control applications by adding a parallel feedforward component. Implementing the feedforward component with a wavelet neural network and an echo state network provides good models for nonlinear systems.The wavelet neural network with feedforward component along with model predictive controller can reliably identify and control a seismically isolated structure during earthquake. The network model provides the predictions for model predictive control. Simulations of a 5-story seismically isolated structure with conventional lead-rubber bearings showed significant reductions of all response amplitudes for both near-field (pulse) and far-field ground motions, including reduced deformations along with corresponding reduction in acceleration response. The controller effectively regulated the apparent stiffness at the isolation level. The approach is also applied to the online identification and control of an unmanned vehicle. Lyapunov theory is used to prove the stability of the wavelet neural network and the model predictive controller. 2. Training neural networks using trajectory based optimization approachesTraining neural networks is a nonlinear non-convex optimization problem to determine the weights of the neural network. Traditional training algorithms can be inefficient and can get trapped in local minima. Two global optimization approaches are adapted to train neural networks and avoid the local minima problem. Lyapunov theory is used to prove the stability of the proposed methodology and its convergence in the presence of measurement errors. The first approach transforms the constraint satisfaction problem into unconstrained optimization. The constraints define a quotient gradient system (QGS) whose stable equilibrium points are local minima of the unconstrained optimization. The QGS is integrated to determine local minima and the local minimum with the best generalization performance is chosen as the optimal solution. The second approach uses the QGS together with a projected gradient system (PGS). The PGS is a nonlinear dynamical system, defined based on the optimization problem that searches the components of the feasible region for solutions. Lyapunov theory is used to prove the stability of PGS and QGS and their stability under presence of measurement noise

    Training issues and learning algorithms for feedforward and recurrent neural networks

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Neural Networks for CollaborativeFiltering

    Get PDF
    Recommender systems are an integral part of almost all modern e-commerce companies. They contribute significantly to the overall customer satisfaction by helping the user discover new and relevant items, which consequently leads to higher sales and stronger customer retention. It is, therefore, not surprising that large e-commerce shops like Amazon or streaming platforms like Netflix and Spotify even use multiple recommender systems to further increase user engagement. Finding the most relevant items for each user is a difficult task that is critically dependent on the available user feedback information. However, most users typically interact with products only through noisy implicit feedback, such as clicks or purchases, rather than providing explicit information about their preferences, such as product ratings. This usually makes large amounts of behavioural user data necessary to infer accurate user preferences. One popular approach to make the most use of both forms of feedback is called collaborative filtering. Here, the main idea is to compare individual user behaviour with the behaviour of all known users. Although there are many different collaborative filtering techniques, matrix factorization models are among the most successful ones. In contrast, while neural networks are nowadays the state-of-the-art method for tasks such as image recognition or natural language processing, they are still not very popular for collaborative filtering tasks. Therefore, the main focus of this thesis is the derivation of multiple wide neural network architectures to mimic and extend matrix factorization models for various collaborative filtering problems and to gain insights into the connection between these models. The basics of the proposed architecture are wide and shallow feedforward neural networks, which will be established for rating prediction tasks on explicit feedback datasets. These networks consist of large input and output layers, which allow them to capture user and item representation similar to matrix factorization models. By deriving all weight updates and comparing the structure of both models, it is proven that a simplified version of the proposed network can mimic common matrix factorization models: a result that has not been shown, as far as we know, in this form before. Additionally, various extensions are thoroughly evaluated. The new findings of this evaluation can also easily be transferred to other matrix factorization models. This neural network architecture can be extended to be used for personalized ranking tasks on implicit feedback datasets. For these problems, it is necessary to rank products according to individual preferences using only the provided implicit feedback. One of the most successful and influential approaches for personalized ranking tasks is Bayesian Personalized Ranking, which attempts to learn pairwise item rankings and can also be used in combination with matrix factorization models. It is shown, how the introduction of an additional ranking layer forces the network to learn pairwise item rankings. In addition, similarities between this novel neural network architecture and a matrix factorization model trained with Bayesian Personalized Ranking are proven. To the best of our knowledge, this is the first time that these connections have been shown. The state-of-the-art performance of this network is demonstrated in a detailed evaluation. The most comprehensive feedback datasets consist of a mixture of explicit as well as implicit feedback information. Here, the goal is to predict if a user will like an item, similar to rating prediction tasks, even if this user has never given any explicit feedback at all: a problem, that has not been covered by the collaborative filtering literature yet. The network to solve this task is composed out of two networks: one for the explicit and one for the implicit feedback. Additional item features are learned using the implicit feedback, which capture all information necessary to rank items. Afterwards, these features are used to improve the explicit feedback prediction. Both parts of this combined network have different optimization goals, are trained simultaneously and, therefore, influence each other. A detailed evaluation shows that this approach is helpful to improve the network's overall predictive performance especially for ranking metrics

    Imaging conductivity from current density magnitude using neural networks

    Get PDF
    Conductivity imaging represents one of the most important tasks in medical imaging. In this work we develop a neural network based reconstruction technique for imaging the conductivity from the magnitude of the internal current density. It is achieved by formulating the problem as a relaxed weighted least-gradient problem, and then approximating its minimizer by standard fully connected feedforward neural networks. We derive bounds on two components of the generalization error, i.e., approximation error and statistical error, explicitly in terms of properties of the neural networks (e.g., depth, total number of parameters, and the bound of the network parameters). We illustrate the performance and distinct features of the approach on several numerical experiments. Numerically, it is observed that the approach enjoys remarkable robustness with respect to the presence of data noise
    • …
    corecore