217,374 research outputs found

    Input Dimension Reduction in Neural Network Training-Case Study in Transient Stability Assessment of Large Systems

    Get PDF
    The problem in modeling large systems by artificial neural networks (ANN) is that the size of the input vector can become excessively large. This condition can potentially increase the likelihood of convergence problems for the training algorithm adopted. Besides, the memory requirement and the processing time also increase. This paper addresses the issue of ANN input dimension reduction. Two different methods are discussed and compared for efficiency and accuracy when applied to transient stability assessment

    From feature to paradigm: deep learning in machine translation

    No full text
    In the last years, deep learning algorithms have highly revolutionized several areas including speech, image and natural language processing. The specific field of Machine Translation (MT) has not remained invariant. Integration of deep learning in MT varies from re-modeling existing features into standard statistical systems to the development of a new architecture. Among the different neural networks, research works use feed- forward neural networks, recurrent neural networks and the encoder-decoder schema. These architectures are able to tackle challenges as having low-resources or morphology variations. This manuscript focuses on describing how these neural networks have been integrated to enhance different aspects and models from statistical MT, including language modeling, word alignment, translation, reordering, and rescoring. Then, we report the new neural MT approach together with a description of the foundational related works and recent approaches on using subword, characters and training with multilingual languages, among others. Finally, we include an analysis of the corresponding challenges and future work in using deep learning in MTPostprint (author's final draft

    Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels

    Full text link
    Humans solving algorithmic (or) reasoning problems typically exhibit solution times that grow as a function of problem difficulty. Adaptive recurrent neural networks have been shown to exhibit this property for various language-processing tasks. However, little work has been performed to assess whether such adaptive computation can also enable vision models to extrapolate solutions beyond their training distribution's difficulty level, with prior work focusing on very simple tasks. In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: PathFinder and Mazes. We combine convolutional recurrent neural networks (ConvRNNs) with a learnable halting mechanism based on Graves (2016). We explore various implementations of such adaptive ConvRNNs (AdRNNs) ranging from tying weights across layers to more sophisticated biologically inspired recurrent networks that possess lateral connections and gating. We show that 1) AdRNNs learn to dynamically halt processing early (or late) to solve easier (or harder) problems, 2) these RNNs zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent iterations at test time. Our study provides modeling evidence supporting the hypothesis that recurrent processing enables the functional advantage of adaptively allocating compute resources conditional on input requirements and hence allowing generalization to harder difficulty levels of a visual reasoning problem without training.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS 2023

    Learning a local-variable model of aromatic and conjugated systems

    Get PDF
    A collection of new approaches to building and training neural networks, collectively referred to as deep learning, are attracting attention in theoretical chemistry. Several groups aim to replace computationally expensive <i>ab initio</i> quantum mechanics calculations with learned estimators. This raises questions about the representability of complex quantum chemical systems with neural networks. Can local-variable models efficiently approximate nonlocal quantum chemical features? Here, we find that convolutional architectures, those that only aggregate information locally, cannot efficiently represent aromaticity and conjugation in large systems. They cannot represent long-range nonlocality known to be important in quantum chemistry. This study uses aromatic and conjugated systems computed from molecule graphs, though reproducing quantum simulations is the ultimate goal. This task, by definition, is both computable and known to be important to chemistry. The failure of convolutional architectures on this focused task calls into question their use in modeling quantum mechanics. To remedy this heretofore unrecognized deficiency, we introduce a new architecture that propagates information back and forth in waves of nonlinear computation. This architecture is still a local-variable model, and it is both computationally and representationally efficient, processing molecules in sublinear time with far fewer parameters than convolutional networks. Wave-like propagation models aromatic and conjugated systems with high accuracy, and even models the impact of small structural changes on large molecules. This new architecture demonstrates that some nonlocal features of quantum chemistry can be efficiently represented in local variable models

    ARCHITECTURE OPTIMIZATION, TRAINING CONVERGENCE AND NETWORK ESTIMATION ROBUSTNESS OF A FULLY CONNECTED RECURRENT NEURAL NETWORK

    Get PDF
    Recurrent neural networks (RNN) have been rapidly developed in recent years. Applications of RNN can be found in system identification, optimization, image processing, pattern reorganization, classification, clustering, memory association, etc. In this study, an optimized RNN is proposed to model nonlinear dynamical systems. A fully connected RNN is developed first which is modified from a fully forward connected neural network (FFCNN) by accommodating recurrent connections among its hidden neurons. In addition, a destructive structure optimization algorithm is applied and the extended Kalman filter (EKF) is adopted as a network\u27s training algorithm. These two algorithms can seamlessly work together to generate the optimized RNN. The enhancement of the modeling performance of the optimized network comes from three parts: 1) its prototype - the FFCNN has advantages over multilayer perceptron network (MLP), the most widely used network, in terms of modeling accuracy and generalization ability; 2) the recurrency in RNN network make it more capable of modeling non-linear dynamical systems; and 3) the structure optimization algorithm further improves RNN\u27s modeling performance in generalization ability and robustness. Performance studies of the proposed network are highlighted in training convergence and robustness. For the training convergence study, the Lyapunov method is used to adapt some training parameters to guarantee the training convergence, while the maximum likelihood method is used to estimate some other parameters to accelerate the training process. In addition, robustness analysis is conducted to develop a robustness measure considering uncertainties propagation through RNN via unscented transform. Two case studies, the modeling of a benchmark non-linear dynamical system and a tool wear progression in hard turning, are carried out to testify the development in this dissertation. The work detailed in this dissertation focuses on the creation of: (1) a new method to prove/guarantee the training convergence of RNN, and (2) a new method to quantify the robustness of RNN using uncertainty propagation analysis. With the proposed study, RNN and related algorithms are developed to model nonlinear dynamical system which can benefit modeling applications such as the condition monitoring studies in terms of robustness and accuracy in the future

    Neural Implicit Representations for Physical Parameter Inference from a Single Video

    Full text link
    Neural networks have recently been used to analyze diverse physical systems and to identify the underlying dynamics. While existing methods achieve impressive results, they are limited by their strong demand for training data and their weak generalization abilities to out-of-distribution data. To overcome these limitations, in this work we propose to combine neural implicit representations for appearance modeling with neural ordinary differential equations (ODEs) for modelling physical phenomena to obtain a dynamic scene representation that can be identified directly from visual observations. Our proposed model combines several unique advantages: (i) Contrary to existing approaches that require large training datasets, we are able to identify physical parameters from only a single video. (ii) The use of neural implicit representations enables the processing of high-resolution videos and the synthesis of photo-realistic images. (iii) The embedded neural ODE has a known parametric form that allows for the identification of interpretable physical parameters, and (iv) long-term prediction in state space. (v) Furthermore, the photo-realistic rendering of novel scenes with modified physical parameters becomes possible.Comment: Published in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 202

    Determination of Head Kinematics from Impact Acceleration Test Data Using Neural Networks

    Get PDF
    This paper presents a study of feed-forward neural network (NN) systems developed to determine the head kinematics of subjects who are exposed to impact accelerations. The neural networks process accelerometer data collected during short-duration impact acceleration tests conducted at the National Biodynamcis Laboratory of the University of New Orleans. During an impact acceleration experiment, the subject sits on the sled chair and a piston gives impetus to the sled to travel down a track. Head data is gathered by an array of nine accelerometers. Two more accelerometers are mounted on the sled. The neural processing systems produce the history of the rotational and translational position, velocity, and acceleration of the origin of the accelerometer array mounted on the mouth. Output produced by a least squares algorithm that uses both photographic and accelerometer raw data are used as a baseline and to provide training data for the neural networks. The main disadvantages of the NNs are their speed, and that statistical information and accurate modeling of the testing system are not required. Results show that the neural networks provide accurate information about the kinematics of the subject even when no photographic data are used
    • …
    corecore