257 research outputs found

    An LPC Excitation Model Using Wavelets

    Full text link
    This paper presents a new model of linear predictive coding (LPC) excitation using wavelets for speech signals. The LPC excitation becomes a linear combination of a set of self- similar, orthonormal, band-pass signals with time localization and constant bandwidth in a logarithmic scale. Thus, the set of the coefficients in the linear combination represents the LPC excitation. The discrete wavelet transform (DWT) obtains the coefficients, having several asymmetrical and non-uniform distribution properties that are attractive for speech processing and compression. The properties include magnitude dependent sensitivity, scale dependent sensitivity, and limited frame length, which can be used for having low bit-rate speech. We show that eliminating 8.97% highest magnitude coefficients degrades speech quality down to 1.49dB SNR, while eliminating 27.51% lowest magnitude coefficient maintain speech quality at a level of 27.42 dB SNR. Furthermore eliminating 6.25% coefficients located at a scale associated with 175-630 Hz band severely degrades speech quality down to 4.20 dB SNR. Finally, our results show that optimal frame length for telephony applications is among 32, 64, or 128 samples

    An LPC Excitation Model using Wavelets

    Get PDF
    This  paper  presents  a  new  model  of  linear  predictive  coding  (LPC) excitation  using  wavelets  for  speech  signals.   The  LPC  excitation   becomes  a linear combination of a set of self-  similar, orthonormal, band-pass signals with time localization and constant bandwidth in a logarithmic scale. Thus, the set of the  coefficients  in  the  linear  combination  represents  the  LPC  excitation.  The discrete  wavelet  transform  (DWT)  obtains  the  coefficients,  having  several asymmetrical  and  non-uniform  distribution  properties  that  are  attractive  for speech processing and compression. The properties include magnitude dependent sensitivity, scale dependent sensitivity, and limited frame length, which can be used  for  having  low  bit-rate  speech.  We  show  that  eliminating  8.97%  highest magnitude  coefficients  degrades  speech  quality  down  to  1.49dB  SNR,  while eliminating  27.51%  lowest  magnitude  coefficient  maintain  speech  quality  at  a level of 27.42 dB SNR. Furthermore eliminating 6.25% coefficients located at a scale associated with 175-630 Hz band severely degrades speech quality down to 4.20 dB SNR. Finally, our results show that optimal frame length for telephony applications is among 32, 64, or 128 samples

    AUTOMATED ARTIFACT REMOVAL AND DETECTION OF MILD COGNITIVE IMPAIRMENT FROM SINGLE CHANNEL ELECTROENCEPHALOGRAPHY SIGNALS FOR REAL-TIME IMPLEMENTATIONS ON WEARABLES

    Get PDF
    Electroencephalogram (EEG) is a technique for recording asynchronous activation of neuronal firing inside the brain with non-invasive scalp electrodes. EEG signal is well studied to evaluate the cognitive state, detect brain diseases such as epilepsy, dementia, coma, autism spectral disorder (ASD), etc. In this dissertation, the EEG signal is studied for the early detection of the Mild Cognitive Impairment (MCI). MCI is the preliminary stage of Dementia that may ultimately lead to Alzheimers disease (AD) in the elderly people. Our goal is to develop a minimalistic MCI detection system that could be integrated to the wearable sensors. This contribution has three major aspects: 1) cleaning the EEG signal, 2) detecting MCI, and 3) predicting the severity of the MCI using the data obtained from a single-channel EEG electrode. Artifacts such as eye blink activities can corrupt the EEG signals. We investigate unsupervised and effective removal of ocular artifact (OA) from single-channel streaming raw EEG data. Wavelet transform (WT) decomposition technique was systematically evaluated for effectiveness of OA removal for a single-channel EEG system. Discrete Wavelet Transform (DWT) and Stationary Wavelet Transform (SWT), is studied with four WT basis functions: haar, coif3, sym3, and bior4.4. The performance of the artifact removal algorithm was evaluated by the correlation coefficients (CC), mutual information (MI), signal to artifact ratio (SAR), normalized mean square error (NMSE), and time-frequency analysis. It is demonstrated that WT can be an effective tool for unsupervised OA removal from single channel EEG data for real-time applications.For the MCI detection from the clean EEG data, we collected the scalp EEG data, while the subjects were stimulated with five auditory speech signals. We extracted 590 features from the Event-Related Potential (ERP) of the collected EEG signals, which included time and spectral domain characteristics of the response. The top 25 features, ranked by the random forest method, were used for classification models to identify subjects with MCI. Robustness of our model was tested using leave-one-out cross-validation while training the classifiers. Best results (leave-one-out cross-validation accuracy 87.9%, sensitivity 84.8%, specificity 95%, and F score 85%) were obtained using support vector machine (SVM) method with Radial Basis Kernel (RBF) (sigma = 10, cost = 102). Similar performances were also observed with logistic regression (LR), further validating the results. Our results suggest that single-channel EEG could provide a robust biomarker for early detection of MCI. We also developed a single channel Electro-encephalography (EEG) based MCI severity monitoring algorithm by generating the Montreal Cognitive Assessment (MoCA) scores from the features extracted from EEG. We performed multi-trial and single-trail analysis for the algorithm development of the MCI severity monitoring. We studied Multivariate Regression (MR), Ensemble Regression (ER), Support Vector Regression (SVR), and Ridge Regression (RR) for multi-trial and deep neural regression for the single-trial analysis. In the case of multi-trial, the best result was obtained from the ER. In our single-trial analysis, we constructed the time-frequency image from each trial and feed it to the convolutional deep neural network (CNN). Performance of the regression models was evaluated by the RMSE and the residual analysis. We obtained the best accuracy with the deep neural regression method

    Frame Theory for Signal Processing in Psychoacoustics

    Full text link
    This review chapter aims to strengthen the link between frame theory and signal processing tasks in psychoacoustics. On the one side, the basic concepts of frame theory are presented and some proofs are provided to explain those concepts in some detail. The goal is to reveal to hearing scientists how this mathematical theory could be relevant for their research. In particular, we focus on frame theory in a filter bank approach, which is probably the most relevant view-point for audio signal processing. On the other side, basic psychoacoustic concepts are presented to stimulate mathematicians to apply their knowledge in this field

    Advances in Robotics, Automation and Control

    Get PDF
    The book presents an excellent overview of the recent developments in the different areas of Robotics, Automation and Control. Through its 24 chapters, this book presents topics related to control and robot design; it also introduces new mathematical tools and techniques devoted to improve the system modeling and control. An important point is the use of rational agents and heuristic techniques to cope with the computational complexity required for controlling complex systems. Through this book, we also find navigation and vision algorithms, automatic handwritten comprehension and speech recognition systems that will be included in the next generation of productive systems developed by man

    Mathematical methods of signal processing

    Get PDF
    The aim of this project is to present in a systematic way the more relevant mathematical methods of signal processing, and to explore how they are applied to speech and image precessing. After explaining the more common parts of a standard course in signal processing, we put special emphasis in two new tools that have played a significant role in signal processing in the past few years: pattern theory and wavelet theory. Finally, we use all these techniques to implement an algorithm that detects the wallpaper group of a plane mosaic taking an image of it as input and an algorithm that returns the phoneme sequence of a speech signal. The material in this memory can be grouped in two parts. The first part, consisting of the first six chapters, deals with the theoretical foundation of signal processing. It also includes materials related to plane symmetry groups. The second part, consisting of the last two chapters, is focussed on the applications

    Mathematics and Digital Signal Processing

    Get PDF
    Modern computer technology has opened up new opportunities for the development of digital signal processing methods. The applications of digital signal processing have expanded significantly and today include audio and speech processing, sonar, radar, and other sensor array processing, spectral density estimation, statistical signal processing, digital image processing, signal processing for telecommunications, control systems, biomedical engineering, and seismology, among others. This Special Issue is aimed at wide coverage of the problems of digital signal processing, from mathematical modeling to the implementation of problem-oriented systems. The basis of digital signal processing is digital filtering. Wavelet analysis implements multiscale signal processing and is used to solve applied problems of de-noising and compression. Processing of visual information, including image and video processing and pattern recognition, is actively used in robotic systems and industrial processes control today. Improving digital signal processing circuits and developing new signal processing systems can improve the technical characteristics of many digital devices. The development of new methods of artificial intelligence, including artificial neural networks and brain-computer interfaces, opens up new prospects for the creation of smart technology. This Special Issue contains the latest technological developments in mathematics and digital signal processing. The stated results are of interest to researchers in the field of applied mathematics and developers of modern digital signal processing systems

    Detecting emotions from speech using machine learning techniques

    Get PDF
    D.Phil. (Electronic Engineering

    Penentuan Filterbank Wavelet Menggunakan Algoritma Mean Best Basis untuk Ekstraksi Ciri Sinyal Suara Ber-Noise

    Get PDF
    Belakangan ini filterbank berbasis wavelet sebagai ekstraktor ciri mulai banyak dikembangkan untuk dapat menggantikan peran ciri Mel Frequency Cepstral Coefficient (MFCC) dalam sistem pengenalan suara otomatis. Salah satu filterbank ciri wavelet yang dikembangkan adalah Wavelet-Packet Cepstral Coefficient (WPCC). Namun sejauh ini pengembangannya hanya difokuskan untuk suara tanpa noise. Sehingga penelitian ini bertujuan untuk mendesain WPCC untuk suara yang mengandung noise. Algoritma Mean Best Basis (MBB) dan fungsi wavelet db44 dan db45 digunakan untuk memperoleh desain filterbank WPCC. Suara yang digunakan adalah rekaman suara vokal bahasa Indonesia a, i, u, e, é, o, dan ó yang mengandung noise. Hasil menunjukkan telah terbentuk dua buah desain filterbank WPCC. Masing-masing merupakan hasil penerapan fungsi daubechies db44 dan db45. Noise tidak memberikan pengaruh terhadap pembentukan kedua filterbank WPCC tersebut. Kedua bentuk filterbank telah memenuhi standar bentuk filter MFCC terutama untuk variabel range dan skala frekuensinya. Range frekuensinya berkisar antara 125 Hz - 1000 Hz dengan bentuk skala yang linier untuk frekuensi di bawah 1000 Hz. Sehingga dapat disimpulkan kedua bentuk filterbank WPCC ini dapat dipertimbangkan untuk digunakan sebagai ekstraktor ciri suara ber-noise. AbstractRecently wavelet-based filterbanks as feature start extractors have been widely developed to replace the role of the Mel Frequency Cepstral Coefficient (MFCC) feature in automatic speech recognition systems. One of the wavelet feature filterbanks developed is Wavelet-Packet Cepstral Coefficient (WPCC). But so far the development has only been focused on clean speech signal. So, the aim of this study is designing WPCC for a noisy speech signal. The Mean Best Basis (MBB) algorithm and db44 and db45 wavelet functions are applied to obtain the WPCC filterbank design. The noisy speech signal used is the recorded utterance Indonesian vowels a, i, u, e, é, o, and ó. The results show that two WPCC filterbank designs have been formed. Each of them is the result of applying the daubechies db44 and db45 functions. Noise has no effect on the establishment of both the WPCC filterbanks. Both fiterbank designs have met MFCC filter form standards, especially for its range of frequency and frequency scale. Its range of frequency is between 125 Hz - 1000 Hz with a linear scale for frequencies below 1000 Hz. Therefore it can be concluded that the two forms of WPCC filterbank can be considered to be used as a feature extractor for a noisy speech signal

    A system for video-based analysis of face motion during speech

    Get PDF
    During face-to-face interaction, facial motion conveys information at various levels. These include a person's emotional condition, position in a discourse, and, while speaking, phonetic details about the speech sounds being produced. Trivially, the measurement of face motion is a prerequisite for any further analysis of its functional characteristics or information content. It is possible to make precise measures of locations on the face using systems that track the motion by means of active or passive markers placed directly on the face. Such systems, however, have the disadvantages of requiring specialised equipment, thus restricting the use outside the lab, and being invasive in the sense that the markers have to be attached to the subject's face. To overcome these limitations we developed a video-based system to measure face motion from standard video recordings by deforming the surface of an ellipsoidal mesh fit to the face. The mesh is initialised manually for a reference frame and then projected onto subsequent video frames. Location changes (between successive frames) for each mesh node are determined adaptively within a well-defined area around each mesh node, using a two-dimensional cross-correlation analysis on a two-dimensional wavelet transform of the frames. Position parameters are propagated in three steps from a coarser mesh and a correspondingly higher scale of the wavelet transform to the final fine mesh and lower scale of the wavelet transform. The sequential changes in position of the mesh nodes represent the facial motion. The method takes advantage of inherent constraints of the facial surfaces which distinguishes it from more general image motion estimation methods and it returns measurement points globally distributed over the facial surface contrary to feature-based methods
    corecore