8 research outputs found

    Hypernasal Speech Analysis via Emperical Mode Decomposition and the Teager-Kasiser Energy Operator

    Get PDF
    In the area of speech science, one particular problem of importance has been to develop a clear method for detecting hypernasality in speech. For speech pathologists, hypernsality is a critical diagnostic used for judging the severity of velopharyngeal (nasal cavity/mouth separation) inadequacy in children with a cleft lip or cleft palate condition. For physicians and particularly neurologists, these same velopharyngeal inadequacies are believed to be linked to nervous system disorders such as Alzheimers disease and particularly Parkinson\u27s disease. One can therefore envision the need to not only find a reliable method for detecting hypernasality, but to also quantify the level (severity) of hypernasality as well. An integral component in the study of speech is the analysis of speech formants, i.e., vocal tract resonances. Traditional acoustical analysis methods of using a linear source model follow the premise that differences between normal and hypernasal speech can be distinguished by shifts or power changes in the formant frequencies and/or the widening (or narrowing) of the formant bandwidths. Such a premise, however, has not been validated with consistency. Part of the reason is that traditional acoustical analysis methods such as one-third octave band, LPC (Linear Predictive Coding), and cepstral analysis are ill-equipped to deal with the nonlinear, non-stationary, and wideband characteristics of normal and nasal speech signals. Relatively newer DSP methods that employ group delay or energy separation overcome some of these problems, but have their own issues such as possible mode mixing, noise, and the aforementioned wideband problem. However, initial investigations into energy separation methods show promise as long as these issues can be resolved. This thesis evaluates the success of a novel acoustical energy approach which deals with the mode mixing and wideband problems where: (1) a DSP sifting algorithm known as the EMD (Empirical Mode Decomposition) is first implemented to decompose the voice signal into a number of IMFs (Intrinsic Mode Functions). (2) Energy analysis is performed on each IMF via the Teager-Kaiser Energy Operator. The proposed EMD energy approach is applied to voice samples taken from the American CLP Craniofacial database and is shown to produce a clear delineation between normal and nasal samples and between different levels of hypernasality.\u2

    Stress and emotion recognition in natural speech in the work and family environments

    Get PDF
    The speech stress and emotion recognition and classification technology has a potential to provide significant benefits to the national and international industry and society in general. The accuracy of an automatic emotion speech and emotion recognition relays heavily on the discrimination power of the characteristic features. This work introduced and examined a number of new linear and nonlinear feature extraction methods for an automatic detection of stress and emotion in speech. The proposed linear feature extraction methods included features derived from the speech spectrograms (SS-CB/BARK/ERB-AE, SS-AF-CB/BARK/ERB-AE, SS-LGF-OFS, SS-ALGF-OFS, SS-SP-ALGF-OFS and SS-sigma-pi), wavelet packets (WP-ALGF-OFS) and the empirical mode decomposition (EMD-AER). The proposed nonlinear feature extraction methods were based on the results of recent laryngological studies and nonlinear modelling of the phonation process. The proposed nonlinear features included the area under the TEO autocorrelation envelope based on different spectral decompositions (TEO-DWT, TEO-WP, TEO-PWP-S and TEO-PWP-G), as well as features representing spectral energy distribution of speech (AUSEES) and glottal waveform (AUSEEG). The proposed features were compared with features based on the classical linear model of speech production including F0, formants, MFCC and glottal time/frequency parameters. Two classifiers GMM and KNN were tested for consistency. The experiments used speech under actual stress from the SUSAS database (7 speakers; 3 female and 4 male) and speech with five naturally expressed emotions (neutral, anger, anxious, dysphoric and happy) from the ORI corpora (71 speakers; 27 female and 44 male). The nonlinear features clearly outperformed all the linear features. The classification results demonstrated consistency with the nonlinear model of the phonation process indicating that the harmonic structure and the spectral distribution of the glottal energy provide the most important cues for stress and emotion recognition in speech. The study also investigated if the automatic emotion recognition can determine differences in emotion expression between parents of depressed adolescents and parents of non-depressed adolescents. It was also investigated if there are differences in emotion expression between mothers and fathers in general. The experiment results indicated that parents of depressed adolescent produce stronger more exaggerated expressions of affect than parents of non-depressed children. And females in general provide easier to discriminate (more exaggerated) expressions of affect than males

    Multichannel analysis of normal and continuous adventitious respiratory sounds for the assessment of pulmonary function in respiratory diseases

    Get PDF
    Premi extraordinari doctorat UPC curs 2015-2016, àmbit d’Enginyeria IndustrialRespiratory sounds (RS) are produced by turbulent airflows through the airways and are inhomogeneously transmitted through different media to the chest surface, where they can be recorded in a non-invasive way. Due to their mechanical nature and airflow dependence, RS are affected by respiratory diseases that alter the mechanical properties of the respiratory system. Therefore, RS provide useful clinical information about the respiratory system structure and functioning. Recent advances in sensors and signal processing techniques have made RS analysis a more objective and sensitive tool for measuring pulmonary function. However, RS analysis is still rarely used in clinical practice. Lack of a standard methodology for recording and processing RS has led to several different approaches to RS analysis, with some methodological issues that could limit the potential of RS analysis in clinical practice (i.e., measurements with a low number of sensors, no controlled airflows, constant airflows, or forced expiratory manoeuvres, the lack of a co-analysis of different types of RS, or the use of inaccurate techniques for processing RS signals). In this thesis, we propose a novel integrated approach to RS analysis that includes a multichannel recording of RS using a maximum of five microphones placed over the trachea and the chest surface, which allows RS to be analysed at the most commonly reported lung regions, without requiring a large number of sensors. Our approach also includes a progressive respiratory manoeuvres with variable airflow, which allows RS to be analysed depending on airflow. Dual RS analyses of both normal RS and continuous adventitious sounds (CAS) are also proposed. Normal RS are analysed through the RS intensity–airflow curves, whereas CAS are analysed through a customised Hilbert spectrum (HS), adapted to RS signal characteristics. The proposed HS represents a step forward in the analysis of CAS. Using HS allows CAS to be fully characterised with regard to duration, mean frequency, and intensity. Further, the high temporal and frequency resolutions, and the high concentrations of energy of this improved version of HS, allow CAS to be more accurately characterised with our HS than by using spectrogram, which has been the most widely used technique for CAS analysis. Our approach to RS analysis was put into clinical practice by launching two studies in the Pulmonary Function Testing Laboratory of the Germans Trias i Pujol University Hospital for assessing pulmonary function in patients with unilateral phrenic paralysis (UPP), and bronchodilator response (BDR) in patients with asthma. RS and airflow signals were recorded in 10 patients with UPP, 50 patients with asthma, and 20 healthy participants. The analysis of RS intensity–airflow curves proved to be a successful method to detect UPP, since we found significant differences between these curves at the posterior base of the lungs in all patients whereas no differences were found in the healthy participants. To the best of our knowledge, this is the first study that uses a quantitative analysis of RS for assessing UPP. Regarding asthma, we found appreciable changes in the RS intensity–airflow curves and CAS features after bronchodilation in patients with negative BDR in spirometry. Therefore, we suggest that the combined analysis of RS intensity–airflow curves and CAS features—including number, duration, mean frequency, and intensity—seems to be a promising technique for assessing BDR and improving the stratification of BDR levels, particularly among patients with negative BDR in spirometry. The novel approach to RS analysis developed in this thesis provides a sensitive tool to obtain objective and complementary information about pulmonary function in a simple and non-invasive way. Together with spirometry, this approach to RS analysis could have a direct clinical application for improving the assessment of pulmonary function in patients with respiratory diseases.Los sonidos respiratorios (SR) se generan con el paso del flujo de aire a través de las vías respiratorias y se transmiten de forma no homogénea hasta la superficie torácica. Dada su naturaleza mecánica, los SR se ven afectados en gran medida por enfermedades que alteran las propiedades mecánicas del sistema respiratorio. Por lo tanto, los SR proporcionan información clínica relevante sobre la estructura y el funcionamiento del sistema respiratorio. La falta de una metodología estándar para el registro y procesado de los SR ha dado lugar a la aparición de diferentes estrategias de análisis de SR con ciertas limitaciones metodológicas que podrían haber restringido el potencial y el uso de esta técnica en la práctica clínica (medidas con pocos sensores, flujos no controlados o constantes y/o maniobras forzadas, análisis no combinado de distintos tipos de SR o uso de técnicas poco precisas para el procesado de los SR). En esta tesis proponemos un método innovador e integrado de análisis de SR que incluye el registro multicanal de SR mediante un máximo de cinco micrófonos colocados sobre la tráquea yla superficie torácica, los cuales permiten analizar los SR en las principales regiones pulmonares sin utilizar un número elevado de sensores . Nuestro método también incluye una maniobra respiratoria progresiva con flujo variable que permite analizar los SR en función del flujo respiratorio. También proponemos el análisis combinado de los SR normales y los sonidos adventicios continuos (SAC), mediante las curvas intensidad-flujo y un espectro de Hilbert (EH) adaptado a las características de los SR, respectivamente. El EH propuesto representa un avance importante en el análisis de los SAC, pues permite su completa caracterización en términos de duración, frecuencia media e intensidad. Además, la alta resolución temporal y frecuencial y la alta concentración de energía de esta versión mejorada del EH permiten caracterizar los SAC de forma más precisa que utilizando el espectrograma, el cual ha sido la técnica más utilizada para el análisis de SAC en estudios previos. Nuestro método de análisis de SR se trasladó a la práctica clínica a través de dos estudios que se iniciaron en el laboratorio de pruebas funcionales del hospital Germans Trias i Pujol, para la evaluación de la función pulmonar en pacientes con parálisis frénica unilateral (PFU) y la respuesta broncodilatadora (RBD) en pacientes con asma. Las señales de SR y flujo respiratorio se registraron en 10 pacientes con PFU, 50 pacientes con asma y 20 controles sanos. El análisis de las curvas intensidad-flujo resultó ser un método apropiado para detectar la PFU , pues encontramos diferencias significativas entre las curvas intensidad-flujo de las bases posteriores de los pulmones en todos los pacientes , mientras que en los controles sanos no encontramos diferencias significativas. Hasta donde sabemos, este es el primer estudio que utiliza el análisis cuantitativo de los SR para evaluar la PFU. En cuanto al asma, encontramos cambios relevantes en las curvas intensidad-flujo yen las características de los SAC tras la broncodilatación en pacientes con RBD negativa en la espirometría. Por lo tanto, sugerimos que el análisis combinado de las curvas intensidad-flujo y las características de los SAC, incluyendo número, duración, frecuencia media e intensidad, es una técnica prometedora para la evaluación de la RBD y la mejora en la estratificación de los distintos niveles de RBD, especialmente en pacientes con RBD negativa en la espirometría. El método innovador de análisis de SR que se propone en esta tesis proporciona una nueva herramienta con una alta sensibilidad para obtener información objetiva y complementaria sobre la función pulmonar de una forma sencilla y no invasiva. Junto con la espirometría, este método puede tener una aplicación clínica directa en la mejora de la evaluación de la función pulmonar en pacientes con enfermedades respiratoriasAward-winningPostprint (published version

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Breathing Rate Estimation From the Electrocardiogram and Photoplethysmogram: A Review.

    Get PDF
    Breathing rate (BR) is a key physiological parameter used in a range of clinical settings. Despite its diagnostic and prognostic value, it is still widely measured by counting breaths manually. A plethora of algorithms have been proposed to estimate BR from the electrocardiogram (ECG) and pulse oximetry (photoplethysmogram, PPG) signals. These BR algorithms provide opportunity for automated, electronic, and unobtrusive measurement of BR in both healthcare and fitness monitoring. This paper presents a review of the literature on BR estimation from the ECG and PPG. First, the structure of BR algorithms and the mathematical techniques used at each stage are described. Second, the experimental methodologies that have been used to assess the performance of BR algorithms are reviewed, and a methodological framework for the assessment of BR algorithms is presented. Third, we outline the most pressing directions for future research, including the steps required to use BR algorithms in wearable sensors, remote video monitoring, and clinical practice

    The use of spectral information in the development of novel techniques for speech-based cognitive load classification

    Full text link
    The cognitive load of a user refers to the amount of mental demand imposed on the user when performing a particular task. Estimating the cognitive load (CL) level of the users is necessary to adjust the workload imposed on them accordingly in order to improve task performance. The current speech based CL classification systems are not adequate for commercial use due to their low performance particularly in noisy environments. This thesis proposes many techniques to improve the performance of the speech based cognitive load classification system in both clean and noisy conditions. This thesis analyses and presents the effectiveness of speech features such as spectral centroid frequency (SCF) and spectral centroid amplitude (SCA) for CL classification. Sub-systems based on SCF and SCA features were developed and fused with the traditional Mel frequency cepstral coefficients (MFCC) based system, producing an 8.9% and 31.5% relative error rate reduction respectively when compared to the MFCC-based system alone. The Stroop test corpus was used in these experiments. The investigation into cognitive load information in the form of spectral distribution in different subbands shows that the information distributed in the low frequency subband is significantly higher than the high frequency subband. Two different methods are proposed to utilize this finding. The first method, called the multi-band approach, uses a weighting scheme to emphasize the speech features in low frequency subbands. The cognitive load classification accuracy of this approach is shown to be higher than a system based on a non-weighting scheme. The second method is to design an effective filterbank based on the spectral distribution of cognitive load information using the Kullback-Leibler distance measure. It is shown that the designed filterbank consistently provides higher classification accuracies than other existing filterbanks such as mel, Bark, and equivalent rectangular bandwidth. A discrete cosine transform based speech enhancement technique is proposed in order to increase the robustness of the CL classification system and found to be more suitable than other methods investigated. This proposed method provides a 3.0% average relative error rate reduction for the seven types of noise and five levels of SNR used. In particular, it provides a maximum of 7.5% relative error rate reduction for the F16 noise (in NOISEX-92 database) at 20 dB SNR

    Libro de actas. XXXV Congreso Anual de la Sociedad Española de Ingeniería Biomédica

    Get PDF
    596 p.CASEIB2017 vuelve a ser el foro de referencia a nivel nacional para el intercambio científico de conocimiento, experiencias y promoción de la I D i en Ingeniería Biomédica. Un punto de encuentro de científicos, profesionales de la industria, ingenieros biomédicos y profesionales clínicos interesados en las últimas novedades en investigación, educación y aplicación industrial y clínica de la ingeniería biomédica. En la presente edición, más de 160 trabajos de alto nivel científico serán presentados en áreas relevantes de la ingeniería biomédica, tales como: procesado de señal e imagen, instrumentación biomédica, telemedicina, modelado de sistemas biomédicos, sistemas inteligentes y sensores, robótica, planificación y simulación quirúrgica, biofotónica y biomateriales. Cabe destacar las sesiones dedicadas a la competición por el Premio José María Ferrero Corral, y la sesión de competición de alumnos de Grado en Ingeniería biomédica, que persiguen fomentar la participación de jóvenes estudiantes e investigadores
    corecore