111,510 research outputs found
Enhanced Recurrent Network Training
In this dissertation, we introduce new, more efficient, methods for training recurrent neuralnetworks (RNNs). These methods are based on a new understanding of the error surfacesof RNNs that has been developed in recent years. These error surfaces contain spuriousvalleys that disrupt the search for global minima. The spurious valleys are caused by instabilitiesin the networks, which become more pronounced with increased prediction horizons.The new methods described in this dissertation increase the prediction horizons in aprincipled way that enables the search algorithms to avoid the spurious valleys.The work also presents a novelty sampling method for collecting new data wisely. Theclustering method determining when an RNN is extrapolating. The extrapolation occurswhen RNN operates outside the region spanned by the training set, adequate performancecannot be guaranteed. The new method presented in this dissertation used the clusteringmethod for extrapolation detection and collecting the novel datas. The training results areimproved with the new data set by retraining the RNN.The Model Reference control is introduced in this dissertation. The MRC is implementedon the simulated and experimental magnetic levitation system.Electrical Engineerin
Exploring Interpretable LSTM Neural Networks over Multi-Variable Data
For recurrent neural networks trained on time series with target and
exogenous variables, in addition to accurate prediction, it is also desired to
provide interpretable insights into the data. In this paper, we explore the
structure of LSTM recurrent neural networks to learn variable-wise hidden
states, with the aim to capture different dynamics in multi-variable time
series and distinguish the contribution of variables to the prediction. With
these variable-wise hidden states, a mixture attention mechanism is proposed to
model the generative process of the target. Then we develop associated training
methods to jointly learn network parameters, variable and temporal importance
w.r.t the prediction of the target variable. Extensive experiments on real
datasets demonstrate enhanced prediction performance by capturing the dynamics
of different variables. Meanwhile, we evaluate the interpretation results both
qualitatively and quantitatively. It exhibits the prospect as an end-to-end
framework for both forecasting and knowledge extraction over multi-variable
data.Comment: Accepted to International Conference on Machine Learning (ICML), 201
Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks
Human action recognition in 3D skeleton sequences has attracted a lot of
research attention. Recently, Long Short-Term Memory (LSTM) networks have shown
promising performance in this task due to their strengths in modeling the
dependencies and dynamics in sequential data. As not all skeletal joints are
informative for action recognition, and the irrelevant joints often bring noise
which can degrade the performance, we need to pay more attention to the
informative ones. However, the original LSTM network does not have explicit
attention ability. In this paper, we propose a new class of LSTM network,
Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action
recognition. This network is capable of selectively focusing on the informative
joints in each frame of each skeleton sequence by using a global context memory
cell. To further improve the attention capability of our network, we also
introduce a recurrent attention mechanism, with which the attention performance
of the network can be enhanced progressively. Moreover, we propose a stepwise
training scheme in order to train our network effectively. Our approach
achieves state-of-the-art performance on five challenging benchmark datasets
for skeleton based action recognition
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression
Deep learning has been recently introduced for efficient acoustic howling
suppression (AHS). However, the recurrent nature of howling creates a mismatch
between offline training and streaming inference, limiting the quality of
enhanced speech. To address this limitation, we propose a hybrid method that
combines a Kalman filter with a self-attentive recurrent neural network (SARNN)
to leverage their respective advantages for robust AHS. During offline
training, a pre-processed signal obtained from the Kalman filter and an ideal
microphone signal generated via teacher-forced training strategy are used to
train the deep neural network (DNN). During streaming inference, the DNN's
parameters are fixed while its output serves as a reference signal for updating
the Kalman filter. Evaluation in both offline and streaming inference scenarios
using simulated and real-recorded data shows that the proposed method
efficiently suppresses howling and consistently outperforms baselines.Comment: submitted to INTERSPEECH 2023. arXiv admin note: text overlap with
arXiv:2302.0925
Rekurrenttien neuroverkkojen käyttäminen kohteiden tunnistamiseen videoissa
This thesis explores recurrent neural network based methods for object detection in video sequences. Several models for object recognition are compared by using the KITTI object tracking dataset containing photos taken in an urban traffic environment. Metrics such as robustness to noise and object velocity prediction error are used to analyze the results. Neural networks and their training methodology is described in depth and recent models from the literature are reviewed.
Several novel convolutional neural network architectures are introduced for the problem. The VGG-19 deep neural network is enhanced with convolutive recurrent layers to make it suitable for video analysis. Additionally a temporal coherency loss term is introduced to guide the learning process. Velocity estimation has not been studied in the literature and the velocity estimation performance was compared against a baseline frame-by-frame object detector neural network.
The results from the experiments show that the recurrent architectures operating on video sequences consistently outperform an object detector that only perceives one frame of video at once. The recurrent models are more resilient to noise and produce more confident object detections as measured by the standard deviation of the predicted bounding boxes. The recurrent models are able to predict object velocity more accurately from video than the baseline frame-by-frame model
- …