280 research outputs found
Analyzing Prosody with Legendre Polynomial Coefficients
This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. We additionally perform machine learning classification for both tasks, (achieving an accuracy of 72.3% for nativeness classification, and achieving 81.57% for sarcasm detection). We recommend that linguists looking to analyze prosodic contours make use of Legendre polynomial coefficients modeling; the accuracy and quality of the resulting prosodic contour representations makes them highly interpretable for linguistic analysis
Acoustic Approaches to Gender and Accent Identification
There has been considerable research on the problems of speaker and language recognition
from samples of speech. A less researched problem is that of accent recognition. Although this
is a similar problem to language identification, di�erent accents of a language exhibit more
fine-grained di�erences between classes than languages. This presents a tougher problem
for traditional classification techniques. In this thesis, we propose and evaluate a number of
techniques for gender and accent classification. These techniques are novel modifications and
extensions to state of the art algorithms, and they result in enhanced performance on gender
and accent recognition.
The first part of the thesis focuses on the problem of gender identification, and presents a
technique that gives improved performance in situations where training and test conditions are
mismatched.
The bulk of this thesis is concerned with the application of the i-Vector technique to accent
identification, which is the most successful approach to acoustic classification to have emerged
in recent years. We show that it is possible to achieve high accuracy accent identification without
reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis
describes various stages in the development of i-Vector based accent classification that improve
the standard approaches usually applied for speaker or language identification, which are
insu�cient. We demonstrate that very good accent identification performance is possible with
acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector
configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can
obtain from the same data.
We claim to have achieved the best accent identification performance on the test corpus
for acoustic methods, with up to 90% identification rate. This performance is even better than
previously reported acoustic-phonotactic based systems on the same corpus, and is very close
to performance obtained via transcription based accent identification. Finally, we demonstrate
that the utilization of our techniques for speech recognition purposes leads to considerably
lower word error rates.
Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian
Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British
English, Prosody, Speech Recognition
Pattern Recognition
A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
Subspace Gaussian Mixture Models for Language Identification and Dysarthric Speech Intelligibility Assessment
En esta Tesis se ha investigado la aplicación de técnicas de modelado de subespacios de mezclas de Gaussianas en dos problemas relacionados con las tecnologÃas del habla, como son la identificación automática de idioma (LID, por sus siglas en inglés) y la evaluación automática de inteligibilidad en el habla de personas con disartria. Una de las técnicas más importantes estudiadas es el análisis factorial conjunto (JFA, por sus siglas en inglés). JFA es, en esencia, un modelo de mezclas de Gaussianas en el que la media de cada componente se expresa como una suma de factores de dimensión reducida, y donde cada factor representa una contribución diferente a la señal de audio. Esta factorización nos permite compensar nuestros modelos frente a contribuciones indeseadas presentes en la señal, como la información de canal. JFA se ha investigado como clasficador y como extractor de parámetros. En esta última aproximación se modela un solo factor que representa todas las contribuciones presentes en la señal. Los puntos en este subespacio se denominan i-Vectors. AsÃ, un i-Vector es un vector de baja dimensión que representa una grabación de audio. Los i-Vectors han resultado ser muy útiles como vector de caracterÃsticas para representar señales en diferentes problemas relacionados con el aprendizaje de máquinas. En relación al problema de LID, se han investigado dos sistemas diferentes de acuerdo al tipo de información extraÃda de la señal. En el primero, la señal se parametriza en vectores acústicos con información espectral a corto plazo. En este caso, observamos mejoras de hasta un 50% con el sistema basado en i-Vectors respecto al sistema que utilizaba JFA como clasificador. Se comprobó que el subespacio de canal del modelo JFA también contenÃa información del idioma, mientras que con los i-Vectors no se descarta ningún tipo de información, y además, son útiles para mitigar diferencias entre los datos de entrenamiento y de evaluación. En la fase de clasificación, los i-Vectors de cada idioma se modelaron con una distribución Gaussiana en la que la matriz de covarianza era común para todos. Este método es simple y rápido, y no requiere de ningún post-procesado de los i-Vectors. En el segundo sistema, se introdujo el uso de información prosódica y formántica en un sistema de LID basado en i-Vectors. La precisión de éste estaba por debajo de la del sistema acústico. Sin embargo, los dos sistemas son complementarios, y se obtuvo hasta un 20% de mejora con la fusión de los dos respecto al sistema acústico solo. Tras los buenos resultados obtenidos para LID, y dado que, teóricamente, los i-Vectors capturan toda la información presente en la señal, decidimos usarlos para la evaluar de manera automática la inteligibilidad en el habla de personas con disartria. Los logopedas están muy interesados en esta tecnologÃa porque permitirÃa evaluar a sus pacientes de una manera objetiva y consistente. En este caso, los i-Vectors se obtuvieron a partir de información espectral a corto plazo de la señal, y la inteligibilidad se calculó a partir de los i-Vectors obtenidos para un conjunto de palabras dichas por el locutor evaluado. Comprobamos que los resultados eran mucho mejores si en el entrenamiento del sistema se incorporaban datos de la persona que iba a ser evaluada. No obstante, esta limitación podrÃa aliviarse utilizando una mayor cantidad de datos para entrenar el sistema.In this Thesis, we investigated how to effciently apply subspace Gaussian mixture modeling techniques onto two speech technology problems, namely automatic spoken language identification (LID) and automatic intelligibility assessment of dysarthric speech. One of the most important of such techniques in this Thesis was joint factor analysis (JFA). JFA is essentially a Gaussian mixture model where the mean of the components is expressed as a sum of low-dimension factors that represent different contributions to the speech signal. This factorization makes it possible to compensate for undesired sources of variability, like the channel. JFA was investigated as final classiffer and as feature extractor. In the latter approach, a single subspace including all sources of variability is trained, and points in this subspace are known as i-Vectors. Thus, one i-Vector is defined as a low-dimension representation of a single utterance, and they are a very powerful feature for different machine learning problems. We have investigated two different LID systems according to the type of features extracted from speech. First, we extracted acoustic features representing short-time spectral information. In this case, we observed relative improvements with i-Vectors with respect to JFA of up to 50%. We realized that the channel subspace in a JFA model also contains language information whereas i-Vectors do not discard any language information, and moreover, they help to reduce mismatches between training and testing data. For classification, we modeled the i-Vectors of each language with a Gaussian distribution with covariance matrix shared among languages. This method is simple and fast, and it worked well without any post-processing. Second, we introduced the use of prosodic and formant information with the i-Vectors system. The performance was below the acoustic system but both were found to be complementary and we obtained up to a 20% relative improvement with the fusion with respect to the acoustic system alone. Given the success in LID and the fact that i-Vectors capture all the information that is present in the data, we decided to use i-Vectors for other tasks, specifically, the assessment of speech intelligibility in speakers with different types of dysarthria. Speech therapists are very interested in this technology because it would allow them to objectively and consistently rate the intelligibility of their patients. In this case, the input features were extracted from short-term spectral information, and the intelligibility was assessed from the i-Vectors calculated from a set of words uttered by the tested speaker. We found that the performance was clearly much better if we had available data for training of the person that would use the application. We think that this limitation could be relaxed if we had larger databases for training. However, the recording process is not easy for people with disabilities, and it is difficult to obtain large datasets of dysarthric speakers open to the research community. Finally, the same system architecture for intelligibility assessment based on i-Vectors was used for predicting the accuracy that an automatic speech recognizer (ASR) system would obtain with dysarthric speakers. The only difference between both was the ground truth label set used for training. Predicting the performance response of an ASR system would increase the confidence of speech therapists in these systems and would diminish health related costs. The results were not as satisfactory as in the previous case, probably because an ASR is a complex system whose accuracy can be very difficult to be predicted only with acoustic information. Nonetheless, we think that we opened a door to an interesting research direction for the two problems
Multivariate Analysis in Management, Engineering and the Sciences
Recently statistical knowledge has become an important requirement and occupies a prominent position in the exercise of various professions. In the real world, the processes have a large volume of data and are naturally multivariate and as such, require a proper treatment. For these conditions it is difficult or practically impossible to use methods of univariate statistics. The wide application of multivariate techniques and the need to spread them more fully in the academic and the business justify the creation of this book. The objective is to demonstrate interdisciplinary applications to identify patterns, trends, association sand dependencies, in the areas of Management, Engineering and Sciences. The book is addressed to both practicing professionals and researchers in the field
Determination and modeling of spatial patterns of brown trout (salmo trutta linnaeus, 1758) in the Deva-Cares catchment: the role of connectivity and the niche
RESUMEN: El objetivo general de la Tesis Doctoral es determinar los patrones espaciales de la trucha común en la red fluvial de la cuenca del rÃo Deva-Cares (norte de España) y analizar los diferentes roles que la conectividad y el nicho tienen en la determinación de la variabilidad espacial de las poblaciones de la trucha común. Los CapÃtulos III a VI conforman el núcleo central de esta Tesis Doctoral. En el CapÃtulo III, se presenta una metodologÃa para clasificar todos los segmentos de una red fluvial como temporales o permanentes. En el CapÃtulo IV, se explora el rol que desempeñan las variables de nicho a diferentes escalas espaciales para determinar los patrones espaciales de densidad de la trucha común a nivel de red fluvial para cada una de las clases de edad analizadas (alevÃn, juvenil y adulto). También se considera el papel de la distancia hidrológica y Euclidea y la presencia de barreras impermeables para explicar los patrones de densidad espacial de la trucha común. En el capÃtulo V, se desarrolla un modelo numérico metapoblacional para estimar la distribución espacial de las densidades de trucha común de las tres clases de edad para toda la red fluvial en función de la topologÃa, la conectividad y la dinámica poblacional. En el CapÃtulo VI, se exploran las consecuencias genéticas de la alteración de la conectividad en una población nativa de trucha común en la cuenca del rÃo Deva-Cares. Tanto la conectividad espacial como las variables ambientales (i.e. conectividad-dispersión versus nicho) tienen un papel fundamental en la determinación de los patrones espaciales de la trucha común en la cuenca del rÃo Deva-Cares.ABSTRACT: The general objective of this PhD Thesis is to determine the spatial patterns of brown trout in the Deva-Cares river network (Northern Spain) and to analyze the different roles that connectivity and the niche have on determining the spatial variability of this brown trout population. Chapter III to VI conform the core of this PhD Thesis. In Chapter III, a statistically-based methodology to classify river segments as temporary or perennial is presented for a whole river network. In Chapter IV, the role that niche variables at different spatial scales are playing on determining spatial density patterns of brown trout for each age-class (young-of-the-year, juveniles and adults) at a whole river network scale was explored. The role of hydrological and Euclidean distance, and the presence of impermeable barriers on explaining brown trout spatial density patterns was also considered. In chapter V, a numerical metapopulation model was developed to estimate the spatial distribution of the brown trout densities of three age-classes to the whole river network based on topology, connectivity and population dynamics. In Chapter VI, the genetic consequences of altered connectivity on a native brown trout population in the Deva-Cares catchment was explored. Both the spatial connectivity and environmental variables (i.e. connectivity-dispersal versus niche concepts) have shown to be important on determining the spatial patterns of brown trout in the Deva-Cares catchment.Agradecer al Instituto de Hidráulica Ambiental de la Universidad de Cantabria la
confianza que tuvieron en mà para llevar a cabo la Tesis y poder formar parte de este
gran equipo de trabajo. Al Ministerio de EconomÃa y Competitividad por su
financiación a través de las Ayudas para contratos predoctorales para la formación de
doctores (BES-2013-065770) asociada al proyecto RIVERLANDS: Efectos del legado
del uso del suelo en los procesos fluviales: implicaciones para la gestión integrada de
cuenca (BIA2012-33572). Agradecer también la financiación de la estancia de
investigación a través de las Ayudas a la movilidad predoctoral para la realización de
estancias breves en centros de I+D españoles y extranjeros (EEBB-I-15-10186)
- …