365 research outputs found

    Sleep staging using contactless audio-based methods

    Get PDF
    Sleep stage classification is essential for evaluating sleep and its disorders. Most sleep studies make use of contact sensors which may interfere with natural sleep although recently the potential for sleep staging from audio signals has been acknowledged. This project presents a non-contact audio-based method for sleep staging. The objective of this work is to develop a method that can classify sleep stages from non-contact audio signals. To achieve the aforementioned objective a measurement acquisition setup has been presented alongside a validation of the acquired respiratory signal and a sleep staging algorithm. 11 subjects have been measured with the proposed method. The validation process compares the pre-processed acquired audio signal with a reference respiratory signal yielding good results in terms of error metrics, with a low deviation between the acquired respiratory cycles using the audio method and the reference method. The sleep stage algorithm classifies sixty-second epochs into NREM or REM stages with good results in terms of REM and NREM detection, with REM and NREM cycle duration similar to the ones that can be found in other studies present in the literature, thus validation the obtained results.La clasificación por fases del sueño es esencial para su evaluación y para la evaluación de sus trastornos. La mayoría de los estudios del sueño requieren del uso de sensores de contacto que podrían alterar la natura de este, aunque recientemente, se ha reconocido el potencial de otros métodos basados en señales de audio sin contacto. Este proyecto presenta un método de clasificación de las fases del sueño basado en señales de audio sin contacto. El objetivo del trabajo es desarrollar un método que permita clasificar las diferentes fases del sueño a partir de señales de audio sin contacto. Para alcanzar este objetivo se ha definido una configuración de medida junto a una validación de la señal respiratoria y un algoritmo de clasificación de las fases del sueño. Se han medido 11 sujetos usando la configuración de medida propuesta. Este proceso de validación compara la señal de audio con una señal de respiración de referencia, dando buenos resultados en términos de métrica de errores, con una baja desviación entre los ciclos respiratorios obtenidos mediante el método de audio propuesto y el método de referencia. El algoritmo de clasificación, clasifica en NREM y REM con buenos resultados en términos de detección de las fases, con una duración de ciclo REM y NREM similar a las que se pueden encontrar en otros estudios presentados en la literatura, validando así los resultados obtenidos.La classificació de les etapes de la son és essencial per la seva avaluació i la dels seus trastorns. La majoria dels estudis de la son fan ús de sensors de contacte que podrien interferir en la natura de la son, tot i que recentment, s'ha reconegut el potencial de mètodes de classificació de les etapes de la son basats en senyals d'àudio sense contacte. Aquest projecte presenta un mètode de classificació de les etapes de la son basat en senyals d'àudio sense contacte. L'objectiu d'aquest treball és desenvolupar un mètode que permeti classificar les diferents etapes de la son a partir de senyals d'àudio sense contacte. Per assolir aquest objectiu s'ha definit una configuració de mesura juntament amb una validació del senyal de respiració i un algorisme de classificació de les etapes de la son. S'han mesurat 11 subjectes utilitzant la configuració proposada. El procés de validació compara el senyal d'àudio capturat, una vegada preprocessat, amb un senyal de respiració com a referència, donant bons resultats en termes de mètriques d'error, amb una desviació baixa entre els cicles respiratoris obtinguts mitjançant el mètode d'àudio i el mètode de referència. L'algorisme de classificació de les etapes de la son, classifica trames de seixanta segons en REM o NREM amb bons resultats en termes de detecció REM o NREM, amb una durada de cicle REM i NREM similar a les que es poden trobar en altres estudis presents en la literatura, validant així els resultats obtinguts

    DEVELOPMENT AND EVALUATION OF ENVELOPE, SPECTRAL AND TIME ENHANCEMENT ALGORITHMS FOR AUDITORY NEUROPATHY

    Get PDF
    Auditory neuropathy (AN) is a hearing disorder that reduces the ability to detect temporal cues in speech, thus leading to deprived speech perception. Traditional amplification and frequency shifting techniques used in modern hearing aids are not suitable to assist individuals with AN due to the unique symptoms that result from the disorder. This study proposes a method for combining both speech envelope enhancement and time scaling to combine the proven benefits of each algorithm. In addition, spectral enhancement is cascaded with envelope and time enhancement to address the poor frequency discrimination in AN. The proposed speech enhancement strategy was evaluated using an AN simulator with normal hearing listeners under varying degrees of AN severity. The results showed a significant increase in word recognition scores for time scaling and envelope enhancement over envelope enhancement alone. Furthermore, the addition of spectral enhancement resulted in further increase in word recognition at profound AN severity

    Hilbert phase methods for glottal activity detection

    Get PDF
    The 2 pi discontinuities found in the wrapped Hilbert phase of the bandpass-filtered analytic DEGG signal provide accurate candidate locations of glottal closure instances (GCIs). Pruning these GCI candidates with an automatically determined amplitude threshold, found by iteratively removing from the full signal the inlier samples within a fraction of its standard deviation until converged, yields a 99.6% accurate detection system with a false alarm rate of 0.17%. This simpler algorithm, named Glottal Activity Detector For Laryngeal Input (GADFLI), outperforms the state-of-the-art SIGMA algorithm for GCI detection, which has a 94.2% detection rate, but a 5.46% false alarm rate. Performance metrics were computed over the entire APLAWD database, using an extensive, hand-verified markings database of 10,944 waveforms. A related proposed algorithm, QuickGCI, also makes use of Hilbert phase discontinuities, and does not require a thresholding post-processing step for GCI selection. Its performance is nearly as good as GADFLI. Both proposed algorithms operate using the electroglottographic signal or acoustic speech signal

    Improving the Speech Intelligibility By Cochlear Implant Users

    Get PDF
    In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    Voice source characterization for prosodic and spectral manipulation

    Get PDF
    The objective of this dissertation is to study and develop techniques to decompose the speech signal into its two main components: voice source and vocal tract. Our main efforts are on the glottal pulse analysis and characterization. We want to explore the utility of this model in different areas of speech processing: speech synthesis, voice conversion or emotion detection among others. Thus, we will study different techniques for prosodic and spectral manipulation. One of our requirements is that the methods should be robust enough to work with the large databases typical of speech synthesis. We use a speech production model in which the glottal flow produced by the vibrating vocal folds goes through the vocal (and nasal) tract cavities and its radiated by the lips. Removing the effect of the vocal tract from the speech signal to obtain the glottal pulse is known as inverse filtering. We use a parametric model fo the glottal pulse directly in the source-filter decomposition phase. In order to validate the accuracy of the parametrization algorithm, we designed a synthetic corpus using LF glottal parameters reported in the literature, complemented with our own results from the vowel database. The results show that our method gives satisfactory results in a wide range of glottal configurations and at different levels of SNR. Our method using the whitened residual compared favorably to this reference, achieving high quality ratings (Good-Excellent). Our full parametrized system scored lower than the other two ranking in third place, but still higher than the acceptance threshold (Fair-Good). Next we proposed two methods for prosody modification, one for each of the residual representations explained above. The first method used our full parametrization system and frame interpolation to perform the desired changes in pitch and duration. The second method used resampling on the residual waveform and a frame selection technique to generate a new sequence of frames to be synthesized. The results showed that both methods are rated similarly (Fair-Good) and that more work is needed in order to achieve quality levels similar to the reference methods. As part of this dissertation, we have studied the application of our models in three different areas: voice conversion, voice quality analysis and emotion recognition. We have included our speech production model in a reference voice conversion system, to evaluate the impact of our parametrization in this task. The results showed that the evaluators preferred our method over the original one, rating it with a higher score in the MOS scale. To study the voice quality, we recorded a small database consisting of isolated, sustained Spanish vowels in four different phonations (modal, rough, creaky and falsetto) and were later also used in our study of voice quality. Comparing the results with those reported in the literature, we found them to generally agree with previous findings. Some differences existed, but they could be attributed to the difficulties in comparing voice qualities produced by different speakers. At the same time we conducted experiments in the field of voice quality identification, with very good results. We have also evaluated the performance of an automatic emotion classifier based on GMM using glottal measures. For each emotion, we have trained an specific model using different features, comparing our parametrization to a baseline system using spectral and prosodic characteristics. The results of the test were very satisfactory, showing a relative error reduction of more than 20% with respect to the baseline system. The accuracy of the different emotions detection was also high, improving the results of previously reported works using the same database. Overall, we can conclude that the glottal source parameters extracted using our algorithm have a positive impact in the field of automatic emotion classification

    불충분한 고장 데이터에 대한 딥러닝 기반 회전 기계 진단기술 학습방법 연구

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 기계항공공학부,2020. 2. 윤병동.Deep Learning is a promising approach for fault diagnosis in mechanical applications. Deep learning techniques are capable of processing lots of data in once, and modelling them into desired diagnostic model. In industrial fields, however, we can acquire tons of data but barely useful including fault or failure data because failure in industrial fields is usually unacceptable. To cope with this insufficient fault data problem to train diagnostic model for rotating machinery, this thesis proposes three research thrusts: 1) filter-envelope blocks in convolution neural networks (CNNs) to incorporate the preprocessing steps for vibration signal; frequency filtering and envelope extraction for more optimal solution and reduced efforts in building diagnostic model, 2) cepstrum editing based data augmentation (CEDA) for diagnostic dataset consist of vibration signals from rotating machinery, and 3) selective parameter freezing (SPF) for efficient parameter transfer in transfer learning. The first research thrust proposes noble types of functional blocks for neural networks in order to learn robust feature to the vibration data. Conventional neural networks including convolution neural network (CNN), is tend to learn biased features when the training data is acquired from small cases of conditions. This can leads to unfavorable performance to the different conditions or other similar equipment. Therefore this research propose two neural network blocks which can be incorporated to the conventional neural networks and minimize the preprocessing steps, filter block and envelope block. Each block is designed to learn frequency filter and envelope extraction function respectively, in order to induce the neural network to learn more robust and generalized features from limited vibration samples. The second thrust presents a new data augmentation technique specialized for diagnostic data of vibration signals. Many data augmentation techniques exist for image data with no consideration for properties of vibration data. Conventional techniques for data augmentation, such as flipping, rotating, or shearing are not proper for 1-d vibration data can harm the natural property of vibration signal. To augment vibration data without losing the properties of its physics, the proposed method generate new samples by editing the cepstrum which can be done by adjusting the cepstrum component of interest. By doing reverse transform to the edited cepstrum, the new samples is obtained and this results augmented dataset which leads to higher accuracy for the diagnostic model. The third research thrust suggests a new parameter repurposing method for parameter transfer, which is used for transfer learning. The proposed SPF selectively freezes transferred parameters from source network and re-train only unnecessary parameters for target domain to reduce overfitting and preserve useful source features when the target data is limited to train diagnostic model.딥러닝은 기계 응용 분야의 결함 진단을 위한 유망한 접근 방식이다. 딥러닝 기술은 많은 양의 데이터를 학습하여 진단 모델의 개발을 용이하게 한다. 그러나 산업 분야에서는 많은 양의 데이터를 얻을 수 없거나 얻을 수 있더라도 고장 데이터는 일반적으로 획득하기 매우 어렵기 때문에 딥러닝 방법의 사용은 쉽지 않다. 회전 기계의 진단을 위하여 딥러닝을 학습시킬 때 발생하는 고장 데이터 부족 문제에 대처하기 위해 이 논문은 3 가지 연구를 제안한다. 1) 향상된 진동 특징 학습을 위한 필터-엔벨롭 네트워크 구조 2) 진동데이터 생성을 위한 Cepstrum 기반 데이터 증량법3) 전이 학습에서 효율적인 파라미터 전이를 위한 선택적 파라미터 동결법. 첫 번째 연구는 진동 데이터에 대한 강건한 특징을 배우기 위해 신경망에 대한 새로운 형태의 네트워크 블록들을 제안한다. 합성곱 신경망을 포함하는 종래의 신경망은 학습 데이터가 작은 경우에 데이터로부터 편향된 특징을 배우는 경향이 있으며, 이는 다른 조건에서 작동하는 경우나 다른 시스템에 대해 적용되었을 때 낮은 진단 성능을 보인다. 따라서 본 연구는 기존의 신경망에 함께 사용될 수 있는 필터 블록 및 엔벨롭 블록을 제안한다. 각 블록은 주파수 필터와 엔벨롭 추출 기능을 네트워크 내에서 스스로 학습하여 신경망이 제한된 학습 진동데이터로부터 보다 강건하고 일반화 된 특징을 학습하도록 한다. 두 번째 연구는 진동 신호의 진단 데이터에 특화된 새로운 데이터 증량법을 제안한다. 뒤집기, 회전 또는 전단과 같은 데이터 확대를 위한 이미지 데이터를 위한 기존의 기술이 1 차원 진동 데이터에 적합하지 않으며, 진동 신호의 물리적 특성에 맞지 않는 신호를 생성할 수 있다. 물리적 특성을 잃지 않고 진동 데이터를 증량하기 위해 제안된 방법은 cepstrum의 주요성분을 추출하고 조정하여 역 cepstrum을 수행하는 방식으로 새로운 샘플을 생성한다. 제안된 방법을 통해 데이터를 생성하여 증량돤 데이터세트는 진단 모델 학습에 대해 성능향상을 가져온다. 세 번째 연구는 전이 학습에 사용되는 파라미터 전이를 위한 새로운 파라미터 재학습법을 제안한다. 제안된 선택적 파라미터 동결법은 소스 네트워크에서 전이된 파라미터를 선택적으로 동결하고 대상 도메인에 대해 불필요한 파라미터만 재학습하여 대상 데이터가 진단 모델에 재학습될 때의 과적합을 줄이고 소스 네트워크의 성능을 보존한다. 제안된 세 방법은 독립적으로 또는 동시에 진단모델에 사용되어 부족한 고장데이터로 인한 진단성능의 감소를 경감하거나 더 높은 성능을 이끌어낼 수 있다.Chapter 1 Introduction 13 1.1 Motivation 13 1.2 Research Scope and Overview 15 1.3 Structure of the Thesis 19 Chapter 2 Literature Review 20 2.1 Deep Neural Networks 20 2.2 Transfer Learning and Parameter Transfer 23 Chapter 3 Description of Testbed Data 26 3.1 Bearing Data I: Case Western Reserve University Data 26 3.2 Bearing Data II: Accelerated Life Test Test-bed 27 Chapter 4 Filter-Envelope Blocks in Neural Network for Robust Feature Learning 32 4.1 Preliminary Study of Problems In Use of CNN for Vibration Signals 34 4.1.1 Class Confusion Problem of CNN Model to Different Conditions 34 4.1.2 Benefits of Frequency Filtering and Envelope Extraction for Fault Diagnosis in Vibration Signals 37 4.2 Proposed Network Block 1: Filter Block 41 4.2.1 Spectral Feature Learning in Neural Network 42 4.2.2 FIR Band-pass Filter in Neural Network 45 4.2.3 Result and Discussion 48 4.3 Proposed Neural Block 2: Envelope Block 48 4.3.1 Max-Average Pooling Block for Envelope Extraction 51 4.3.2 Adaptive Average Pooling for Learnable Envelope Extractor 52 4.3.3 Result and Discussion 54 4.4 Filter-Envelope Network for Fault Diagnosis 56 4.4.1 Combinations of Filter-Envelope Blocks for the use of Rolling Element Bearing Fault Diagnosis 56 4.4.2 Summary and Discussion 58 Chapter 5 Cepstrum Editing Based Data Augmentation for Vibration Signals 59 5.1 Brief Review of Data Augmentation for Deep Learning 59 5.1.1 Image Augmentation to Enlarge Training Dataset 59 5.1.2 Data Augmentation for Vibration Signal 61 5.2 Cepstrum Editing based Data Augmentation 62 5.2.1 Cepstrum Editing as a Signal Preprocessing 62 5.2.2 Cepstrum Editing based Data Augmentation 64 5.3 Results and Discussion 65 5.3.1 Performance validation to rolling element bearing diagnosis 65 Chapter 6 Selective Parameter Freezing for Parameter Transfer with Small Dataset 71 6.1 Overall Procedure of Selective Parameter Freezing 72 6.2 Determination Sensitivity of Source Network Parameters 75 6.3 Case Study 1: Transfer to Different Fault Size 76 6.3.1 Performance by hyperparameter α 77 6.3.2 Effect of the number of training samples and network size 79 6.4 Case Study 2: Transfer from Artificial to Natural Fault 81 6.4.1 Diagnostic performance for proposed method 82 6.4.2 Visualization of frozen parameters by hyperparameter α 83 6.4.3 Visual inspection of feature space 85 6.5 Conclusion 87 Chapter 7 91 7.1 Contributions and Significance 91Docto
    corecore