400 research outputs found
SVD-based Universal DNN Modeling for Multiple Scenarios
Abstract Speech recognition scenarios (aka tasks) differ from each other in acoustic transducers, acoustic environments, and speaking style etc. Building one acoustic model per task is one common practice in industry. However, this limits training data sharing across scenarios thus may not give highest possible accuracy. Based on the deep neural network (DNN) technique, we propose to build a universal acoustic model for all scenarios by utilizing all the data together. Two advantages are obtained: 1) leveraging more data sources to improve the recognition accuracy, 2) reducing substantially service deployment and maintenance costs. We achieve this by extending the singular value decomposition (SVD) structure of DNNs. The data from all scenarios are used to first train a single SVD-DNN model. Then a series of scenario-dependent linear square matrices are added on top of each SVD layer and updated with only scenario-related data. At the recognition time, a flag indicates different scenarios and guides the recognizer to use the scenario-dependent matrices together with the scenarioindependent matrices in the universal DNN for acoustic score evaluation. In our experiments on Microsoft Winphone/Skype/Xbox data sets, the universal DNN model is better than traditional trained isolated models, with up to 15.5% relative word error rate reduction
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Lifelong Deep Learning-based Control Of Robot Manipulators
This study proposes a lifelong deep learning control scheme for robotic manipulators with bounded disturbances. This scheme involves the use of an online tunable deep neural network (DNN) to approximate the unknown nonlinear dynamics of the robot. The control scheme is developed by using a singular value decomposition-based direct tracking error-driven approach, which is utilized to derive the weight update laws for the DNN. To avoid catastrophic forgetting in multi-task scenarios and to ensure lifelong learning (LL), a novel online LL scheme based on elastic weight consolidation is included in the DNN weight-tuning laws. Our results demonstrate that the resulting closed-loop system is uniformly ultimately bounded while the forgetting is reduced. To demonstrate the effectiveness of our approach, we provide simulation results comparing it with the conventional single-layer NN approach and confirm its theoretical claims. The cumulative effect of the error and control input in the multitasking system shows a 43% improvement in performance by using the proposed LL-based DNN control over recent literature
A Comparison of Image Denoising Methods
The advancement of imaging devices and countless images generated everyday
pose an increasingly high demand on image denoising, which still remains a
challenging task in terms of both effectiveness and efficiency. To improve
denoising quality, numerous denoising techniques and approaches have been
proposed in the past decades, including different transforms, regularization
terms, algebraic representations and especially advanced deep neural network
(DNN) architectures. Despite their sophistication, many methods may fail to
achieve desirable results for simultaneous noise removal and fine detail
preservation. In this paper, to investigate the applicability of existing
denoising techniques, we compare a variety of denoising methods on both
synthetic and real-world datasets for different applications. We also introduce
a new dataset for benchmarking, and the evaluations are performed from four
different perspectives including quantitative metrics, visual effects, human
ratings and computational cost. Our experiments demonstrate: (i) the
effectiveness and efficiency of representative traditional denoisers for
various denoising tasks, (ii) a simple matrix-based algorithm may be able to
produce similar results compared with its tensor counterparts, and (iii) the
notable achievements of DNN models, which exhibit impressive generalization
ability and show state-of-the-art performance on various datasets. In spite of
the progress in recent years, we discuss shortcomings and possible extensions
of existing techniques. Datasets, code and results are made publicly available
and will be continuously updated at
https://github.com/ZhaomingKong/Denoising-Comparison.Comment: In this paper, we intend to collect and compare various denoising
methods to investigate their effectiveness, efficiency, applicability and
generalization ability with both synthetic and real-world experiment
Human Activity Recognition for AI-Enabled Healthcare Using Low-Resolution Infrared Sensor Data
This paper explores the feasibility of using low-resolution infrared (LRIR) image streams for human activity recognition (HAR) with potential application in e-healthcare. Two datasets based on synchronized multichannel LRIR sensors systems are considered for a comprehensive study about optimal data acquisition. A novel noise reduction technique is proposed for alleviating the effects of horizontal and vertical periodic noise in the 2D spatiotemporal activity profiles created by vectorizing and concatenating the LRIR frames. Two main analysis strategies are explored for HAR, including (1) manual feature extraction using texture-based and orthogonal-transformation-based techniques, followed by classification using support vector machine (SVM), random forest (RF), k-nearest neighbor (k-NN), and logistic regression (LR), and (2) deep neural network (DNN) strategy based on a convolutional long short-term memory (LSTM). The proposed periodic noise reduction technique showcases an increase of up to 14.15% using different models. In addition, for the first time, the optimum number of sensors, sensor layout, and distance to subjects are studied, indicating the optimum results based on a single side sensor at a close distance. Reasonable accuracies are achieved in the case of sensor displacement and robustness in detection of multiple subjects. Furthermore, the models show suitability for data collected in different environments
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Automatic speech recognition (ASR) has recently become an important challenge
when using deep learning (DL). It requires large-scale training datasets and
high computational and storage resources. Moreover, DL techniques and machine
learning (ML) approaches in general, hypothesize that training and testing data
come from the same domain, with the same input feature space and data
distribution characteristics. This assumption, however, is not applicable in
some real-world artificial intelligence (AI) applications. Moreover, there are
situations where gathering real data is challenging, expensive, or rarely
occurring, which can not meet the data requirements of DL models. deep transfer
learning (DTL) has been introduced to overcome these issues, which helps
develop high-performing models using real datasets that are small or slightly
different but related to the training data. This paper presents a comprehensive
survey of DTL-based ASR frameworks to shed light on the latest developments and
helps academics and professionals understand current challenges. Specifically,
after presenting the DTL background, a well-designed taxonomy is adopted to
inform the state-of-the-art. A critical analysis is then conducted to identify
the limitations and advantages of each framework. Moving on, a comparative
study is introduced to highlight the current challenges before deriving
opportunities for future research
- …