1,208 research outputs found

    GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

    Get PDF
    The goal of this dissertation is to develop methods to recover glottal flow pulses, which contain biometrical information about the speaker. The excitation information estimated from an observed speech utterance is modeled as the source of an inverse problem. Windowed linear prediction analysis and inverse filtering are first used to deconvolve the speech signal to obtain a rough estimate of glottal flow pulses. Linear prediction and its inverse filtering can largely eliminate the vocal-tract response which is usually modeled as infinite impulse response filter. Some remaining vocal-tract components that reside in the estimate after inverse filtering are next removed by maximum-phase and minimum-phase decomposition which is implemented by applying the complex cepstrum to the initial estimate of the glottal pulses. The additive and residual errors from inverse filtering can be suppressed by higher-order statistics which is the method used to calculate cepstrum representations. Some features directly provided by the glottal source\u27s cepstrum representation as well as fitting parameters for estimated pulses are used to form feature patterns that were applied to a minimum-distance classifier to realize a speaker identification system with very limited subjects

    Scalable image quality assessment with 2D mel-cepstrum and machine learning approach

    Get PDF
    Cataloged from PDF version of article.Measurement of image quality is of fundamental importance to numerous image and video processing applications. Objective image quality assessment (IQA) is a two-stage process comprising of the following: (a) extraction of important information and discarding the redundant one, (b) pooling the detected features using appropriate weights. These two stages are not easy to tackle due to the complex nature of the human visual system (HVS). In this paper, we first investigate image features based on two-dimensional (20) mel-cepstrum for the purpose of IQA. It is shown that these features are effective since they can represent the structural information, which is crucial for IQA. Moreover, they are also beneficial in a reduced-reference scenario where only partial reference image information is used for quality assessment. We address the second issue by exploiting machine learning. In our opinion, the well established methodology of machine learning/pattern recognition has not been adequately used for IQA so far; we believe that it will be an effective tool for feature pooling since the required weights/parameters can be determined in a more convincing way via training with the ground truth obtained according to subjective scores. This helps to overcome the limitations of the existing pooling methods, which tend to be over simplistic and lack theoretical justification. Therefore, we propose a new metric by formulating IQA as a pattern recognition problem. Extensive experiments conducted using six publicly available image databases (totally 3211 images with diverse distortions) and one video database (with 78 video sequences) demonstrate the effectiveness and efficiency of the proposed metric, in comparison with seven relevant existing metrics. (C) 2011 Elsevier Ltd. All rights reserved

    Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)

    Get PDF
    Extraction of singing voice from music is one of the ongoing research topics in the field of speech recognition and audio analysis. In particular, this topic finds many applications in the music field, such as in determining music structure, lyrics recognition, and singer recognition. Although many studies have been conducted for the separation of voice from the background, there has been less study on singing voice in particular. In this study, efforts were made to design a new methodology to improve the separation of vocal and non-vocal components in audio clips using REPET [14]. In the newly designed method, we tried to rectify the issues encountered in the REPET method, while designing an improved repeating mask which is used to extract the non-vocal component in audio. The main reason why the REPET method was preferred over previous methods for this study is its independent nature. More specifically, the majority of existing methods for the separation of singing voice from music were constructed explicitly based on one or more assumptions

    Bispectrum- and Bicoherence-Based Discriminative Features Used for Classification of Radar Targets and Atmospheric Formations

    Get PDF
    This chapter is dedicated to bispectrum-based signal processing in the surveillance radar applications. Detection, recognition, and classification of the targets by surveillance radars have various applications including security, military intelligence, battlefield purposes, boundary protection, as well as weather forecast. One of the particular and effective discriminative features commonly exploited in modern radar automatic target recognition (ATR) systems is the micro-Doppler (m-D) contributions extracted from joint time-frequency (TF) distribution. However, a common drawback of the energy-based strategy lies in the impossibility to retrieve additional particular information related to frequency-coupling and phase-coupling contributions containing in the radar backscattering. Phase coupling contains additional discriminative features related to individual target properties. Bispectrum-based strategy allows retrieving a phase-coupled data containing unique discriminative features related to individual target properties. Bispectrum tends to zero for a stationary zero-mean additive white Gaussian noise (AWGN), providing smoothing of AWGN in TF distributions. Hence, bispectrum-based approach allows improving extraction of robust discriminative features for ATR radar systems

    ๋ถˆ์ถฉ๋ถ„ํ•œ ๊ณ ์žฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ํšŒ์ „ ๊ธฐ๊ณ„ ์ง„๋‹จ๊ธฐ์ˆ  ํ•™์Šต๋ฐฉ๋ฒ• ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ๊ธฐ๊ณ„ํ•ญ๊ณต๊ณตํ•™๋ถ€,2020. 2. ์œค๋ณ‘๋™.Deep Learning is a promising approach for fault diagnosis in mechanical applications. Deep learning techniques are capable of processing lots of data in once, and modelling them into desired diagnostic model. In industrial fields, however, we can acquire tons of data but barely useful including fault or failure data because failure in industrial fields is usually unacceptable. To cope with this insufficient fault data problem to train diagnostic model for rotating machinery, this thesis proposes three research thrusts: 1) filter-envelope blocks in convolution neural networks (CNNs) to incorporate the preprocessing steps for vibration signal; frequency filtering and envelope extraction for more optimal solution and reduced efforts in building diagnostic model, 2) cepstrum editing based data augmentation (CEDA) for diagnostic dataset consist of vibration signals from rotating machinery, and 3) selective parameter freezing (SPF) for efficient parameter transfer in transfer learning. The first research thrust proposes noble types of functional blocks for neural networks in order to learn robust feature to the vibration data. Conventional neural networks including convolution neural network (CNN), is tend to learn biased features when the training data is acquired from small cases of conditions. This can leads to unfavorable performance to the different conditions or other similar equipment. Therefore this research propose two neural network blocks which can be incorporated to the conventional neural networks and minimize the preprocessing steps, filter block and envelope block. Each block is designed to learn frequency filter and envelope extraction function respectively, in order to induce the neural network to learn more robust and generalized features from limited vibration samples. The second thrust presents a new data augmentation technique specialized for diagnostic data of vibration signals. Many data augmentation techniques exist for image data with no consideration for properties of vibration data. Conventional techniques for data augmentation, such as flipping, rotating, or shearing are not proper for 1-d vibration data can harm the natural property of vibration signal. To augment vibration data without losing the properties of its physics, the proposed method generate new samples by editing the cepstrum which can be done by adjusting the cepstrum component of interest. By doing reverse transform to the edited cepstrum, the new samples is obtained and this results augmented dataset which leads to higher accuracy for the diagnostic model. The third research thrust suggests a new parameter repurposing method for parameter transfer, which is used for transfer learning. The proposed SPF selectively freezes transferred parameters from source network and re-train only unnecessary parameters for target domain to reduce overfitting and preserve useful source features when the target data is limited to train diagnostic model.๋”ฅ๋Ÿฌ๋‹์€ ๊ธฐ๊ณ„ ์‘์šฉ ๋ถ„์•ผ์˜ ๊ฒฐํ•จ ์ง„๋‹จ์„ ์œ„ํ•œ ์œ ๋งํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์ด๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์€ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ง„๋‹จ ๋ชจ๋ธ์˜ ๊ฐœ๋ฐœ์„ ์šฉ์ดํ•˜๊ฒŒ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฐ์—… ๋ถ„์•ผ์—์„œ๋Š” ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์—†๊ฑฐ๋‚˜ ์–ป์„ ์ˆ˜ ์žˆ๋”๋ผ๋„ ๊ณ ์žฅ ๋ฐ์ดํ„ฐ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํš๋“ํ•˜๊ธฐ ๋งค์šฐ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•์˜ ์‚ฌ์šฉ์€ ์‰ฝ์ง€ ์•Š๋‹ค. ํšŒ์ „ ๊ธฐ๊ณ„์˜ ์ง„๋‹จ์„ ์œ„ํ•˜์—ฌ ๋”ฅ๋Ÿฌ๋‹์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ๊ณ ์žฅ ๋ฐ์ดํ„ฐ ๋ถ€์กฑ ๋ฌธ์ œ์— ๋Œ€์ฒ˜ํ•˜๊ธฐ ์œ„ํ•ด ์ด ๋…ผ๋ฌธ์€ 3 ๊ฐ€์ง€ ์—ฐ๊ตฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. 1) ํ–ฅ์ƒ๋œ ์ง„๋™ ํŠน์ง• ํ•™์Šต์„ ์œ„ํ•œ ํ•„ํ„ฐ-์—”๋ฒจ๋กญ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ 2) ์ง„๋™๋ฐ์ดํ„ฐ ์ƒ์„ฑ์„ ์œ„ํ•œ Cepstrum ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๋Ÿ‰๋ฒ•3) ์ „์ด ํ•™์Šต์—์„œ ํšจ์œจ์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ ์ „์ด๋ฅผ ์œ„ํ•œ ์„ ํƒ์  ํŒŒ๋ผ๋ฏธํ„ฐ ๋™๊ฒฐ๋ฒ•. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ง„๋™ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฐ•๊ฑดํ•œ ํŠน์ง•์„ ๋ฐฐ์šฐ๊ธฐ ์œ„ํ•ด ์‹ ๊ฒฝ๋ง์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ํ˜•ํƒœ์˜ ๋„คํŠธ์›Œํฌ ๋ธ”๋ก๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์„ ํฌํ•จํ•˜๋Š” ์ข…๋ž˜์˜ ์‹ ๊ฒฝ๋ง์€ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ์ž‘์€ ๊ฒฝ์šฐ์— ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํŽธํ–ฅ๋œ ํŠน์ง•์„ ๋ฐฐ์šฐ๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋‹ค๋ฅธ ์กฐ๊ฑด์—์„œ ์ž‘๋™ํ•˜๋Š” ๊ฒฝ์šฐ๋‚˜ ๋‹ค๋ฅธ ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด ์ ์šฉ๋˜์—ˆ์„ ๋•Œ ๋‚ฎ์€ ์ง„๋‹จ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ์‹ ๊ฒฝ๋ง์— ํ•จ๊ป˜ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ํ•„ํ„ฐ ๋ธ”๋ก ๋ฐ ์—”๋ฒจ๋กญ ๋ธ”๋ก์„ ์ œ์•ˆํ•œ๋‹ค. ๊ฐ ๋ธ”๋ก์€ ์ฃผํŒŒ์ˆ˜ ํ•„ํ„ฐ์™€ ์—”๋ฒจ๋กญ ์ถ”์ถœ ๊ธฐ๋Šฅ์„ ๋„คํŠธ์›Œํฌ ๋‚ด์—์„œ ์Šค์Šค๋กœ ํ•™์Šตํ•˜์—ฌ ์‹ ๊ฒฝ๋ง์ด ์ œํ•œ๋œ ํ•™์Šต ์ง„๋™๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ณด๋‹ค ๊ฐ•๊ฑดํ•˜๊ณ  ์ผ๋ฐ˜ํ™” ๋œ ํŠน์ง•์„ ํ•™์Šตํ•˜๋„๋ก ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ง„๋™ ์‹ ํ˜ธ์˜ ์ง„๋‹จ ๋ฐ์ดํ„ฐ์— ํŠนํ™”๋œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ฆ๋Ÿ‰๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋’ค์ง‘๊ธฐ, ํšŒ์ „ ๋˜๋Š” ์ „๋‹จ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ํ™•๋Œ€๋ฅผ ์œ„ํ•œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ๊ธฐ์กด์˜ ๊ธฐ์ˆ ์ด 1 ์ฐจ์› ์ง„๋™ ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์ง„๋™ ์‹ ํ˜ธ์˜ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ์— ๋งž์ง€ ์•Š๋Š” ์‹ ํ˜ธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฌผ๋ฆฌ์  ํŠน์„ฑ์„ ์žƒ์ง€ ์•Š๊ณ  ์ง„๋™ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๋Ÿ‰ํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ cepstrum์˜ ์ฃผ์š”์„ฑ๋ถ„์„ ์ถ”์ถœํ•˜๊ณ  ์กฐ์ •ํ•˜์—ฌ ์—ญ cepstrum์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ฆ๋Ÿ‰๋ค ๋ฐ์ดํ„ฐ์„ธํŠธ๋Š” ์ง„๋‹จ ๋ชจ๋ธ ํ•™์Šต์— ๋Œ€ํ•ด ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜จ๋‹ค. ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ „์ด ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ์ „์ด๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํŒŒ๋ผ๋ฏธํ„ฐ ์žฌํ•™์Šต๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์„ ํƒ์  ํŒŒ๋ผ๋ฏธํ„ฐ ๋™๊ฒฐ๋ฒ•์€ ์†Œ์Šค ๋„คํŠธ์›Œํฌ์—์„œ ์ „์ด๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ ํƒ์ ์œผ๋กœ ๋™๊ฒฐํ•˜๊ณ  ๋Œ€์ƒ ๋„๋ฉ”์ธ์— ๋Œ€ํ•ด ๋ถˆํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์žฌํ•™์Šตํ•˜์—ฌ ๋Œ€์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ์ง„๋‹จ ๋ชจ๋ธ์— ์žฌํ•™์Šต๋  ๋•Œ์˜ ๊ณผ์ ํ•ฉ์„ ์ค„์ด๊ณ  ์†Œ์Šค ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ๋ณด์กดํ•œ๋‹ค. ์ œ์•ˆ๋œ ์„ธ ๋ฐฉ๋ฒ•์€ ๋…๋ฆฝ์ ์œผ๋กœ ๋˜๋Š” ๋™์‹œ์— ์ง„๋‹จ๋ชจ๋ธ์— ์‚ฌ์šฉ๋˜์–ด ๋ถ€์กฑํ•œ ๊ณ ์žฅ๋ฐ์ดํ„ฐ๋กœ ์ธํ•œ ์ง„๋‹จ์„ฑ๋Šฅ์˜ ๊ฐ์†Œ๋ฅผ ๊ฒฝ๊ฐํ•˜๊ฑฐ๋‚˜ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ์ด๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.Chapter 1 Introduction 13 1.1 Motivation 13 1.2 Research Scope and Overview 15 1.3 Structure of the Thesis 19 Chapter 2 Literature Review 20 2.1 Deep Neural Networks 20 2.2 Transfer Learning and Parameter Transfer 23 Chapter 3 Description of Testbed Data 26 3.1 Bearing Data I: Case Western Reserve University Data 26 3.2 Bearing Data II: Accelerated Life Test Test-bed 27 Chapter 4 Filter-Envelope Blocks in Neural Network for Robust Feature Learning 32 4.1 Preliminary Study of Problems In Use of CNN for Vibration Signals 34 4.1.1 Class Confusion Problem of CNN Model to Different Conditions 34 4.1.2 Benefits of Frequency Filtering and Envelope Extraction for Fault Diagnosis in Vibration Signals 37 4.2 Proposed Network Block 1: Filter Block 41 4.2.1 Spectral Feature Learning in Neural Network 42 4.2.2 FIR Band-pass Filter in Neural Network 45 4.2.3 Result and Discussion 48 4.3 Proposed Neural Block 2: Envelope Block 48 4.3.1 Max-Average Pooling Block for Envelope Extraction 51 4.3.2 Adaptive Average Pooling for Learnable Envelope Extractor 52 4.3.3 Result and Discussion 54 4.4 Filter-Envelope Network for Fault Diagnosis 56 4.4.1 Combinations of Filter-Envelope Blocks for the use of Rolling Element Bearing Fault Diagnosis 56 4.4.2 Summary and Discussion 58 Chapter 5 Cepstrum Editing Based Data Augmentation for Vibration Signals 59 5.1 Brief Review of Data Augmentation for Deep Learning 59 5.1.1 Image Augmentation to Enlarge Training Dataset 59 5.1.2 Data Augmentation for Vibration Signal 61 5.2 Cepstrum Editing based Data Augmentation 62 5.2.1 Cepstrum Editing as a Signal Preprocessing 62 5.2.2 Cepstrum Editing based Data Augmentation 64 5.3 Results and Discussion 65 5.3.1 Performance validation to rolling element bearing diagnosis 65 Chapter 6 Selective Parameter Freezing for Parameter Transfer with Small Dataset 71 6.1 Overall Procedure of Selective Parameter Freezing 72 6.2 Determination Sensitivity of Source Network Parameters 75 6.3 Case Study 1: Transfer to Different Fault Size 76 6.3.1 Performance by hyperparameter ฮฑ 77 6.3.2 Effect of the number of training samples and network size 79 6.4 Case Study 2: Transfer from Artificial to Natural Fault 81 6.4.1 Diagnostic performance for proposed method 82 6.4.2 Visualization of frozen parameters by hyperparameter ฮฑ 83 6.4.3 Visual inspection of feature space 85 6.5 Conclusion 87 Chapter 7 91 7.1 Contributions and Significance 91Docto

    Automated Accident Detection In Intersections Via Digital Audio Signal Processing

    Get PDF
    The aim of this thesis is to design a system for automated accident detection in intersections. The input to the system is a three-second audio signal. The system can be operated in two modes: two-class and multi-class. The output of the two-class system is a label of ?crash? or ?non-crash?. In the multi-class system, the output is the label of ?crash? or various non-crash incidents including ?pile drive?, ?brake?, and ?normal-traffic? sounds. The system designed has three main steps in processing the input audio signal. They are: feature extraction, feature optimization and classification. Five different methods of feature extraction are investigated and compared; they are based on the discrete wavelet transform, fast Fourier transform, discrete cosine transform, real cepstrum transform and Mel frequency cepstral transform. Linear discriminant analysis (LDA) is used to optimize the features obtained in the feature extraction stage by linearly combining the features using different weights. Three types of statistical classifiers are investigated and compared: the nearest neighbor, nearest mean, and maximum likelihood methods. Data collected from Jackson, MS and Starkville, MS and the crash signals obtained from Texas Transportation Institute crash test facility are used to train and test the designed system. The results showed that the wavelet based feature extraction method with LDA and maximum likelihood classifier is the optimum design. This wavelet-based system is computationally inexpensive compared to other methods. The system produced classification accuracies of 95% to 100% when the input signal has a signal-to-noise-ratio of at least 0 decibels. These results show that the system is capable of effectively classifying ?crash? or ?non-crash? on a given input audio signal
    • โ€ฆ
    corecore