111,510 research outputs found

    Enhanced Recurrent Network Training

    Get PDF
    In this dissertation, we introduce new, more efficient, methods for training recurrent neuralnetworks (RNNs). These methods are based on a new understanding of the error surfacesof RNNs that has been developed in recent years. These error surfaces contain spuriousvalleys that disrupt the search for global minima. The spurious valleys are caused by instabilitiesin the networks, which become more pronounced with increased prediction horizons.The new methods described in this dissertation increase the prediction horizons in aprincipled way that enables the search algorithms to avoid the spurious valleys.The work also presents a novelty sampling method for collecting new data wisely. Theclustering method determining when an RNN is extrapolating. The extrapolation occurswhen RNN operates outside the region spanned by the training set, adequate performancecannot be guaranteed. The new method presented in this dissertation used the clusteringmethod for extrapolation detection and collecting the novel datas. The training results areimproved with the new data set by retraining the RNN.The Model Reference control is introduced in this dissertation. The MRC is implementedon the simulated and experimental magnetic levitation system.Electrical Engineerin

    Exploring Interpretable LSTM Neural Networks over Multi-Variable Data

    Full text link
    For recurrent neural networks trained on time series with target and exogenous variables, in addition to accurate prediction, it is also desired to provide interpretable insights into the data. In this paper, we explore the structure of LSTM recurrent neural networks to learn variable-wise hidden states, with the aim to capture different dynamics in multi-variable time series and distinguish the contribution of variables to the prediction. With these variable-wise hidden states, a mixture attention mechanism is proposed to model the generative process of the target. Then we develop associated training methods to jointly learn network parameters, variable and temporal importance w.r.t the prediction of the target variable. Extensive experiments on real datasets demonstrate enhanced prediction performance by capturing the dynamics of different variables. Meanwhile, we evaluate the interpretation results both qualitatively and quantitatively. It exhibits the prospect as an end-to-end framework for both forecasting and knowledge extraction over multi-variable data.Comment: Accepted to International Conference on Machine Learning (ICML), 201

    Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks

    Full text link
    Human action recognition in 3D skeleton sequences has attracted a lot of research attention. Recently, Long Short-Term Memory (LSTM) networks have shown promising performance in this task due to their strengths in modeling the dependencies and dynamics in sequential data. As not all skeletal joints are informative for action recognition, and the irrelevant joints often bring noise which can degrade the performance, we need to pay more attention to the informative ones. However, the original LSTM network does not have explicit attention ability. In this paper, we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action recognition. This network is capable of selectively focusing on the informative joints in each frame of each skeleton sequence by using a global context memory cell. To further improve the attention capability of our network, we also introduce a recurrent attention mechanism, with which the attention performance of the network can be enhanced progressively. Moreover, we propose a stepwise training scheme in order to train our network effectively. Our approach achieves state-of-the-art performance on five challenging benchmark datasets for skeleton based action recognition

    Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

    Full text link
    Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage their respective advantages for robust AHS. During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN). During streaming inference, the DNN's parameters are fixed while its output serves as a reference signal for updating the Kalman filter. Evaluation in both offline and streaming inference scenarios using simulated and real-recorded data shows that the proposed method efficiently suppresses howling and consistently outperforms baselines.Comment: submitted to INTERSPEECH 2023. arXiv admin note: text overlap with arXiv:2302.0925

    Rekurrenttien neuroverkkojen käyttäminen kohteiden tunnistamiseen videoissa

    Get PDF
    This thesis explores recurrent neural network based methods for object detection in video sequences. Several models for object recognition are compared by using the KITTI object tracking dataset containing photos taken in an urban traffic environment. Metrics such as robustness to noise and object velocity prediction error are used to analyze the results. Neural networks and their training methodology is described in depth and recent models from the literature are reviewed. Several novel convolutional neural network architectures are introduced for the problem. The VGG-19 deep neural network is enhanced with convolutive recurrent layers to make it suitable for video analysis. Additionally a temporal coherency loss term is introduced to guide the learning process. Velocity estimation has not been studied in the literature and the velocity estimation performance was compared against a baseline frame-by-frame object detector neural network. The results from the experiments show that the recurrent architectures operating on video sequences consistently outperform an object detector that only perceives one frame of video at once. The recurrent models are more resilient to noise and produce more confident object detections as measured by the standard deviation of the predicted bounding boxes. The recurrent models are able to predict object velocity more accurately from video than the baseline frame-by-frame model
    • …
    corecore