518 research outputs found
Reviewing the connection between speech and obstructive sleep apnea
The electronic version of this article is the complete one and can be found online at: http://link.springer.com/article/10.1186/s12938-016-0138-5Background: Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several
approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications.
Methods: A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea–hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients’ condition. We first evaluate AHI prediction using state-of-theart speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector
regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height,
weight, body mass index, and cervical perimeter, are also studied.
Results: The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research.
This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results.
Conclusion: The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of
prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not
only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.Authors thank to Sonia Martinez Diaz for her effort in collecting the OSA database that is used in this study. This research was partly supported by the Ministry of Economy and Competitiveness of Spain and the European Union (FEDER) under project "CMC-V2", TEC2012-37585-C02
Recommended from our members
The Challenge of Spoken Language Systems: Research Directions for the Nineties
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area
Recommended from our members
The Challenge of Spoken Language Systems: Research Directions for the Nineties
A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area
Bayesian nonparametric models for data exploration
Mención Internacional en el título de doctorMaking sense out of data is one of the biggest challenges of our time. With the emergence of
technologies such as the Internet, sensor networks or deep genome sequencing, a true data explosion
has been unleashed that affects all fields of science and our everyday life. Recent breakthroughs, such
as self-driven cars or champion-level Go player programs, have demonstrated the potential benefits
from exploiting data, mostly in well-defined supervised tasks. However, we have barely started to
actually explore and truly understand data.
In fact, data holds valuable information for answering most important questions for humanity:
How does aging impact our physical capabilities? What are the underlying mechanisms of cancer?
Which factors make countries wealthier than others? Most of these questions cannot be stated as
well-defined supervised problems, and might benefit enormously from multidisciplinary research
efforts involving easy-to-interpret models and rigorous data exploratory analyses. Efficient data exploration
might lead to life-changing scientific discoveries, which can later be turned into a more impactful
exploitation phase, to put forward more informed policy recommendations, decision-making
systems, medical protocols or improved models for highly accurate predictions.
This thesis proposes tailored Bayesian nonparametric (BNP) models to solve specific data exploratory
tasks across different scientific areas including sport sciences, cancer research, and economics.
We resort to BNP approaches to facilitate the discovery of unexpected hidden patterns
within data. BNP models place a prior distribution over an infinite-dimensional parameter space,
which makes them particularly useful in probabilistic models where the number of hidden parameters
is unknown a priori. Under this prior distribution, the posterior distribution of the hidden parameters
given the data will assign high probability mass to those configurations that best explain the
observations. Hence, inference over the hidden variables can be performed using standard Bayesian
inference techniques, therefore avoiding expensive model selection steps.
This thesis is application-focused and highly multidisciplinary. More precisely, we propose an
automatic grading system for sportive competitions to compare athletic performance regardless of
age, gender and environmental aspects; we develop BNP models to perform genetic association
and biomarker discovery in cancer research, either using genetic information and Electronic Health
Records or clinical trial data; finally, we present a flexible infinite latent factor model of international
trade data to understand the underlying economic structure of countries and their evolution over time.Uno de los principales desafíos de nuestro tiempo es encontrar sentido dentro de los datos. Con la aparición de tecnologías como Internet, redes de sensores, o métodos de secuenciación profunda
del genoma, una verdadera explosión digital se ha visto desencadenada, afectando todos los campos científicos, así como nuestra vida diaria. Logros recientes como pueden ser los coches auto-dirigidos o programas que ganan a los seres humanos al milenario juego del Go, han demostrado con creces los posibles beneficios que podemos obtener de la explotación de datos, mayoritariamente en tareas
supervisadas bien definidas. No obstante, apenas hemos empezado con la exploración de datos y su verdadero entendimiento.
En verdad, los datos encierran información muy valiosa para responder a muchas de las preguntas
más importantes para la humanidad: ¿Cómo afecta el envejecimiento a nuestras aptitudes físicas?
¿Cuáles son los mecanismos subyacentes del cáncer? ¿Qué factores explican la riqueza de ciertos
países frente a otros? Si bien la mayoría de estas preguntas no pueden formularse como problemas
supervisados bien definidos, éstas pueden ser abordadas mediante esfuerzos de investigación
multidisciplinar que involucren modelos fáciles de interpretar y análisis exploratorios rigurosos. Explorar los datos de manera eficiente abre potencialmente la puerta a un sinnúmero de descubrimientos
científicos en diversas áreas con impacto real en nuestras vidas, descubrimientos que a su vez pueden llevarnos a una mejor explotación de los datos, resultando en recomendaciones políticas adecuadas, sistemas precisos de toma de decisión, protocolos médicos optimizados o modelos con mejores capacidades
predictivas. Esta tesis propone modelos Bayesianos no-paramétricos (BNP) adecuados para la resolución específica de tareas explorativas de los datos en diversos ámbitos científicos incluyendo ciencias del
deporte, investigación contra el cáncer, o economía. Recurrimos a un planteamiento BNP para facilitar el descubrimiento de patrones ocultos inesperados subyacentes en los datos. Los modelos
BNP definen una distribución a priori sobre un espacio de parámetros de dimensión infinita, lo cual los hace especialmente atractivos para enfoques probabilísticos donde el número de parámetros latentes
es en principio desconocido. Bajo dicha distribución a priori, la distribución a posteriori de los parámetros ocultos dados los datos asignará mayor probabilidad a aquellas configuraciones que
mejor explican las observaciones. De esta manera, la inferencia sobre el espacio de variables ocultas puede realizarse mediante técnicas estándar de inferencia Bayesiana, evitando el proceso de selección
de modelos. Esta tesis se centra en el ámbito de las aplicaciones, y es de naturaleza multidisciplinar. En
concreto, proponemos un sistema de gradación automática para comparar el rendimiento deportivo
de atletas independientemente de su edad o género, así como de otros factores del entorno. Desarrollamos
modelos BNP para descubrir asociaciones genéticas y biomarcadores dentro de la investigación contra el cáncer, ya sea contrastando información genética con la historia clínica electrónica
de los pacientes, o utilizando datos de ensayos clínicos; finalmente, presentamos un modelo flexible
de factores latentes infinito para datos de comercio internacional, con el objetivo de entender la
estructura económica de los distintos países y su correspondiente evolución a lo largo del tiempo.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Joaquín Míguez Arenas.- Secretario: Daniel Hernández Lobato.- Vocal: Cédric Archambea
Automatic speaker recognition: modelling, feature extraction and effects of clinical environment
Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced
Autoregressive models for text independent speaker identification in noisy environments
The closed-set speaker identification problem is defined as the search within a set of persons for the speaker of a certain
utterance. It is reported that the Gaussian mixture model (GMM) classifier achieves very high classification accuracies (in the
range 95% - 100%) when both the training and testing utterances are recorded in sound proof studio, i.e., there is neither additive
noise nor spectral distortion to the speech signals.
However, in real life applications, speech is usually corrupted by noise and band-limitation. Moreover, there is a mismatch between the recording conditions of the training and testing environments. As a
result, the classification accuracy of GMM-based systems deteriorates significantly. In this thesis, we propose a two-step procedure for improving the speaker identification performance under noisy environment. In the first step, we introduce a new classifier: vector autoregressive Gaussian mixture (VARGM) model. Unlike the
GMM, the new classifier models correlations between successive feature vectors. We also integrate the proposed method into the
framework of the universal background model (UBM). In addition, we develop the learning procedure according to the maximum likelihood
(ML) criterion. Based on a thorough experimental evaluation, the proposed method achieves an improvement of 3 to 5% in the
identification accuracy.
In the second step, we propose a new compensation technique based on the generalized maximum likelihood (GML) decision rule. In particular, we assume a general form for the distribution of the
noise-corrupted utterances, which contains two types of parameters: clean speech-related parameters and noise-related parameters. While the clean speech related parameters are estimated during the
training phase, the noise related parameters are estimated from the corrupted speech in the testing phase. We applied the proposed
method to utterances of 50 speakers selected from the TIMIT database, artificially corrupted by convolutive and additive noise.
The signal to noise ratio (SNR) varies from 0 to 20 dB. Simulation results reveal that the proposed method achieves good robustness
against variation in the SNR. For utterances corrupted by covolutive noise, the improvement in the classification accuracy ranges from 70% for SNR = 0 dB to around 4% for SNR = 10dB, compared to the standard ML decision rule. For utterances corrupted by additive noise, the improvement in the classification accuracy ranges from 1% to 10% for SNRs ranging from 0 to 20 dB.
The proposed VARGM classifier is also applied to the speech emotion classification problem. In particular, we use the Berlin emotional speech database to validate the classification performance of the proposed VARGM classifier. The proposed technique provides a classification accuracy of 76% versus 71% for the hidden Markov model, 67% for the k-nearest neighbors, 55% for feed-forward neural networks. The model gives also better discrimination between
high-arousal emotions (joy, anger, fear), low arousal emotions (sadness, boredom), and neutral emotions than the HMM.
Another interesting application of the VARGM model is the blind equalization of multi input multiple output (MIMO) communication
channels. Based on VARGM modeling of MIMO channels, we propose a four-step equalization procedure. First, the received data vectors are fitted into a VARGM model using the expectation maximization (EM) algorithm. The constructed VARGM model is then used to filter the received data. A Baysian decision rule is then applied to
identify the transmitted symbols up to a permutation and phase ambiguities, which are finally resolved using a small training
sequence. Moreover, we propose a fast and easily implementable model order selection technique. The new equalization algorithm is
compared to the whitening method and found to provide less symbol error probability. The proposed technique is also applied to
frequency-flat slow fading channels and found to provide a more accurate estimate of the channel response than that provided by the blind de-convolution exploiting channel encoding (BDCC) method and at a higher information rate
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
- …