214 research outputs found

    Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

    Get PDF
    Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compac..

    DiabDeep: Pervasive Diabetes Diagnosis based on Wearable Medical Sensors and Efficient Neural Networks

    Full text link
    Diabetes impacts the quality of life of millions of people. However, diabetes diagnosis is still an arduous process, given that the disease develops and gets treated outside the clinic. The emergence of wearable medical sensors (WMSs) and machine learning points to a way forward to address this challenge. WMSs enable a continuous mechanism to collect and analyze physiological signals. However, disease diagnosis based on WMS data and its effective deployment on resource-constrained edge devices remain challenging due to inefficient feature extraction and vast computation cost. In this work, we propose a framework called DiabDeep that combines efficient neural networks (called DiabNNs) with WMSs for pervasive diabetes diagnosis. DiabDeep bypasses the feature extraction stage and acts directly on WMS data. It enables both an (i) accurate inference on the server, e.g., a desktop, and (ii) efficient inference on an edge device, e.g., a smartphone, based on varying design goals and resource budgets. On the server, we stack sparsely connected layers to deliver high accuracy. On the edge, we use a hidden-layer long short-term memory based recurrent layer to cut down on computation and storage. At the core of DiabDeep lies a grow-and-prune training flow: it leverages gradient-based growth and magnitude-based pruning algorithms to learn both weights and connections for DiabNNs. We demonstrate the effectiveness of DiabDeep through analyzing data from 52 participants. For server (edge) side inference, we achieve a 96.3% (95.3%) accuracy in classifying diabetics against healthy individuals, and a 95.7% (94.6%) accuracy in distinguishing among type-1/type-2 diabetic, and healthy individuals. Against conventional baselines, DiabNNs achieve higher accuracy, while reducing the model size (FLOPs) by up to 454.5x (8.9x). Therefore, the system can be viewed as pervasive and efficient, yet very accurate

    Deep neural network techniques for monaural speech enhancement: state of the art analysis

    Full text link
    Deep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in these domains in task such as machine translation and image generation. Due to their success, these data driven techniques have been applied in audio domain. More specifically, DNN models have been applied in speech enhancement domain to achieve denosing, dereverberation and multi-speaker separation in monaural speech enhancement. In this paper, we review some dominant DNN techniques being employed to achieve speech separation. The review looks at the whole pipeline of speech enhancement from feature extraction, how DNN based tools are modelling both global and local features of speech and model training (supervised and unsupervised). We also review the use of speech-enhancement pre-trained models to boost speech enhancement process. The review is geared towards covering the dominant trends with regards to DNN application in speech enhancement in speech obtained via a single speaker.Comment: conferenc

    Deep Network Regularization with Representation Shaping

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(๋””์ง€ํ„ธ์ •๋ณด์œตํ•ฉ์ „๊ณต), 2019. 2. Rhee, Wonjong.The statistical characteristics of learned representations such as correlation and representational sparsity are known to be relevant to the performance of deep learning methods. Also, learning meaningful and useful data representations by using regularization methods has been one of the central concerns in deep learning. In this dissertation, deep network regularization using representation shaping are studied. Roughly, the following questions are answered: what are the common statistical characteristics of representations that high-performing networks share? Do the characteristics have a causal relationship with performance? To answer the questions, five representation regularizers are proposed: class-wise Covariance Regularizer (cw-CR), Variance Regularizer (VR), class-wise Variance Regularizer (cw-VR), Rank Regularizer (RR), and class-wise Rank Regularizer (cw-RR). Significant performance improvements were found for a variety of tasks over popular benchmark datasets with the regularizers. The visualization of learned representations shows that the regularizers used in this work indeed perform distinct representation shaping. Then, with a variety of representation regularizers, a few statistical characteristics of learned representations including covariance, correlation, sparsity, dead unit, and rank are investigated. Our theoretical analysis and experimental results indicate that all the statistical characteristics considered in this work fail to show any general or causal pattern for improving performance. Mutual information I(zx) and I(zy) are examined as well, and it is shown that regularizers can affect I(zx) and thus indirectly influence the performance. Finally, two practical ways of using representation regularizers are presented to address the usefulness of representation regularizers: using a set of representation regularizers as a performance tuning tool and enhancing network compression with representation regularizers.Chapter 1. Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2. Generalization, Regularization, and Representation in Deep Learning 8 2.1 Deep Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Capacity, Overfitting, and Generalization . . . . . . . . . . . 11 2.2.2 Generalization in Deep Learning . . . . . . . . . . . . . . . . 12 2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Capacity Control and Regularization . . . . . . . . . . . . . . 14 2.3.2 Regularization for Deep Learning . . . . . . . . . . . . . . . 16 2.4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Representation Learning . . . . . . . . . . . . . . . . . . . . 18 2.4.2 Representation Shaping . . . . . . . . . . . . . . . . . . . . 20 Chapter 3. Representation Regularizer Design with Class Information 26 3.1 Class-wise Representation Regularizers: cw-CR and cw-VR . . . . . 27 3.1.1 Basic Statistics of Representations . . . . . . . . . . . . . . . 27 3.1.2 cw-CR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.3 cw-VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.4 Penalty Loss Functions and Gradients . . . . . . . . . . . . . 30 3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Image Classification Task . . . . . . . . . . . . . . . . . . . 33 3.2.2 Image Reconstruction Task . . . . . . . . . . . . . . . . . . . 36 3.3 Analysis of Representation Characteristics . . . . . . . . . . . . . . . 36 3.3.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Layer Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4. Representation Characteristics and Their Relationship with Performance 42 4.1 Representation Characteristics . . . . . . . . . . . . . . . . . . . . . 43 4.2 Experimental Results of Representation Regularization . . . . . . . . 46 4.3 Scaling, Permutation, Covariance, and Correlation . . . . . . . . . . . 48 4.3.1 Identical Output Network (ION) . . . . . . . . . . . . . . . . 48 4.3.2 Possible Extensions for ION . . . . . . . . . . . . . . . . . . 51 4.4 Sparsity, Dead Unit, and Rank . . . . . . . . . . . . . . . . . . . . . 55 4.4.1 Analytical Relationship . . . . . . . . . . . . . . . . . . . . . 55 4.4.2 Rank Regularizer . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.3 A Controlled Experiment on Data Generation Process . . . . 58 4.5 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Chapter 5. Practical Ways of Using Representation Regularizers 65 5.1 Tuning Deep Network Performance Using Representation Regularizers 65 5.1.1 Experimental Settings and Conditions . . . . . . . . . . . . . 66 5.1.2 Consistently Well-performing Regularizer . . . . . . . . . . . 67 5.1.3 Performance Improvement Using Regularizers as a Set . . . . 68 5.2 Enhancing Network Compression Using Representation Regularizers 68 5.2.1 The Need for Network Compression . . . . . . . . . . . . . . 72 5.2.2 Three Typical Approaches for Network Compression . . . . . 73 5.2.3 Proposed Approaches and Experimental Results . . . . . . . 74 Chapter 6. Discussion 79 6.1 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.1.1 Usefulness of Class Information . . . . . . . . . . . . . . . . 79 6.1.2 Comparison with Non-penalty Regularizers: Dropout and Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.3 Identical Output Network . . . . . . . . . . . . . . . . . . . 82 6.1.4 Using Representation Regularizers for Performance Tuning . 82 6.1.5 Benefits and Drawbacks of Different Statistical Characteristics of Representations . . . . . . . . . . . . . . . . . . . . . . . 83 6.2 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.1 Understanding the Underlying Mechanism of Representation Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2.2 Manipulating Representation Characteristics other than Covariance and Variance for ReLU Networks . . . . . . . . . . . . 86 6.2.3 Investigating Representation Characteristics of Complicated Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.3 Possible Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.3.1 Interpreting Learned Representations via Visualization . . . . 88 6.3.2 Designing a Regularizer Utilizing Mutual Information . . . . 89 6.3.3 Applying Multiple Representation Regularizers to a Network . 90 6.3.4 Enhancing Deep Network Compression via Representation Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Chapter 7. Conclusion 93 Bibliography 94 Appendix 103 A Principal Component Analysis of Learned Representations . . . . . . 104 B Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Acknowlegement 113Docto
    • โ€ฆ
    corecore