400 research outputs found

    SVD-based Universal DNN Modeling for Multiple Scenarios

    Get PDF
    Abstract Speech recognition scenarios (aka tasks) differ from each other in acoustic transducers, acoustic environments, and speaking style etc. Building one acoustic model per task is one common practice in industry. However, this limits training data sharing across scenarios thus may not give highest possible accuracy. Based on the deep neural network (DNN) technique, we propose to build a universal acoustic model for all scenarios by utilizing all the data together. Two advantages are obtained: 1) leveraging more data sources to improve the recognition accuracy, 2) reducing substantially service deployment and maintenance costs. We achieve this by extending the singular value decomposition (SVD) structure of DNNs. The data from all scenarios are used to first train a single SVD-DNN model. Then a series of scenario-dependent linear square matrices are added on top of each SVD layer and updated with only scenario-related data. At the recognition time, a flag indicates different scenarios and guides the recognizer to use the scenario-dependent matrices together with the scenarioindependent matrices in the universal DNN for acoustic score evaluation. In our experiments on Microsoft Winphone/Skype/Xbox data sets, the universal DNN model is better than traditional trained isolated models, with up to 15.5% relative word error rate reduction

    Transfer Learning for Speech and Language Processing

    Full text link
    Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

    Lifelong Deep Learning-based Control Of Robot Manipulators

    Get PDF
    This study proposes a lifelong deep learning control scheme for robotic manipulators with bounded disturbances. This scheme involves the use of an online tunable deep neural network (DNN) to approximate the unknown nonlinear dynamics of the robot. The control scheme is developed by using a singular value decomposition-based direct tracking error-driven approach, which is utilized to derive the weight update laws for the DNN. To avoid catastrophic forgetting in multi-task scenarios and to ensure lifelong learning (LL), a novel online LL scheme based on elastic weight consolidation is included in the DNN weight-tuning laws. Our results demonstrate that the resulting closed-loop system is uniformly ultimately bounded while the forgetting is reduced. To demonstrate the effectiveness of our approach, we provide simulation results comparing it with the conventional single-layer NN approach and confirm its theoretical claims. The cumulative effect of the error and control input in the multitasking system shows a 43% improvement in performance by using the proposed LL-based DNN control over recent literature

    A Comparison of Image Denoising Methods

    Full text link
    The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic representations and especially advanced deep neural network (DNN) architectures. Despite their sophistication, many methods may fail to achieve desirable results for simultaneous noise removal and fine detail preservation. In this paper, to investigate the applicability of existing denoising techniques, we compare a variety of denoising methods on both synthetic and real-world datasets for different applications. We also introduce a new dataset for benchmarking, and the evaluations are performed from four different perspectives including quantitative metrics, visual effects, human ratings and computational cost. Our experiments demonstrate: (i) the effectiveness and efficiency of representative traditional denoisers for various denoising tasks, (ii) a simple matrix-based algorithm may be able to produce similar results compared with its tensor counterparts, and (iii) the notable achievements of DNN models, which exhibit impressive generalization ability and show state-of-the-art performance on various datasets. In spite of the progress in recent years, we discuss shortcomings and possible extensions of existing techniques. Datasets, code and results are made publicly available and will be continuously updated at https://github.com/ZhaomingKong/Denoising-Comparison.Comment: In this paper, we intend to collect and compare various denoising methods to investigate their effectiveness, efficiency, applicability and generalization ability with both synthetic and real-world experiment

    Human Activity Recognition for AI-Enabled Healthcare Using Low-Resolution Infrared Sensor Data

    Get PDF
    This paper explores the feasibility of using low-resolution infrared (LRIR) image streams for human activity recognition (HAR) with potential application in e-healthcare. Two datasets based on synchronized multichannel LRIR sensors systems are considered for a comprehensive study about optimal data acquisition. A novel noise reduction technique is proposed for alleviating the effects of horizontal and vertical periodic noise in the 2D spatiotemporal activity profiles created by vectorizing and concatenating the LRIR frames. Two main analysis strategies are explored for HAR, including (1) manual feature extraction using texture-based and orthogonal-transformation-based techniques, followed by classification using support vector machine (SVM), random forest (RF), k-nearest neighbor (k-NN), and logistic regression (LR), and (2) deep neural network (DNN) strategy based on a convolutional long short-term memory (LSTM). The proposed periodic noise reduction technique showcases an increase of up to 14.15% using different models. In addition, for the first time, the optimum number of sensors, sensor layout, and distance to subjects are studied, indicating the optimum results based on a single side sensor at a close distance. Reasonable accuracies are achieved in the case of sensor displacement and robustness in detection of multiple subjects. Furthermore, the models show suitability for data collected in different environments

    Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

    Full text link
    Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research
    • …
    corecore