42 research outputs found

    State-dependent time warping in the trended hidden Markov model

    Full text link
    In this paper we present an algorithm for estimating state-dependent polynomial coefficients in the nonstationary-state hidden Markov model (or the trended HMM) which allows for the flexibility of linear time warping or scaling in individual model states. The need for the state-dependent time warping arises from the consideration that due to speaking rate variation and other temporal factors in speech, multiple state-segmented speech data sequences used for training a single set of polynomial coefficients often vary appreciably in their sequence lengths. The algorithm is developed based on a general framework with use of auxiliary parameters, which, of no interests in themselves, nevertheless provide an intermediate tool for achieving maximal accuracy for estimating the polynomial coefficients in the trended HMM. It is proved that the proposed estimation algorithm converges to a solution equivalent to the state-optimized maximum likelihood estimate. Effectiveness of the algorithm is demonstrated in experiments designed to fit a single trended HMM simultaneously to multiple sequences of speech data which are different renditions of the same word yet vary over a wide range in the sequence length. Speech recognition experiments have been performed based on the standard acoustic-phonetic TIMIT database. The speech recognition results demonstrate the advantages of the time-warping trended HMMs over the regular trended HMMs measured about 10 to 15% improvement in terms of the recognition rate.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/31358/1/0000269.pd

    Speech Recognition Using Advanced HMM2 Features

    Get PDF
    HMM2 is a particular hidden Markov model where state emission probabilities of the temporal (primary) HMM are modeled through (secondary) state-dependent frequency-based HMMs [12]. As shown in [13], a secondary HMM can also be used to extract robust ASR features. Here, we further investigate this novel approach towards using a full HMM2 as feature extractor, working in the spectral domain, and extracting robust formant-like features for standard ASR system. HMM2 performs a nonlinear, state-dependent frequency warping, and it is shown that the resulting frequency segmentation actually contains particularly discriminant features. To further improve the HMM2 system, we complement the initial spectral energy vectors with frequency information. Finally, adding temporal information to the HMM2 feature vector yields further improvements. These conclusions are experimentally validated on the Numbers95 database, where word error rates of 15\%, using only a 4-dimensional feature vector (3 formant-like parameters and one time index) were obtained

    ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION

    Get PDF
    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system

    Evolving Clustering Algorithms And Their Application For Condition Monitoring, Diagnostics, & Prognostics

    Get PDF
    Applications of Condition-Based Maintenance (CBM) technology requires effective yet generic data driven methods capable of carrying out diagnostics and prognostics tasks without detailed domain knowledge and human intervention. Improved system availability, operational safety, and enhanced logistics and supply chain performance could be achieved, with the widespread deployment of CBM, at a lower cost level. This dissertation focuses on the development of a Mutual Information based Recursive Gustafson-Kessel-Like (MIRGKL) clustering algorithm which operates recursively to identify underlying model structure and parameters from stream type data. Inspired by the Evolving Gustafson-Kessel-like Clustering (eGKL) algorithm, we applied the notion of mutual information to the well-known Mahalanobis distance as the governing similarity measure throughout. This is also a special case of the Kullback-Leibler (KL) Divergence where between-cluster shape information (governed by the determinant and trace of the covariance matrix) is omitted and is only applicable in the case of normally distributed data. In the cluster assignment and consolidation process, we proposed the use of the Chi-square statistic with the provision of having different probability thresholds. Due to the symmetry and boundedness property brought in by the mutual information formulation, we have shown with real-world data that the algorithm’s performance becomes less sensitive to the same range of probability thresholds which makes system tuning a simpler task in practice. As a result, improvement demonstrated by the proposed algorithm has implications in improving generic data driven methods for diagnostics, prognostics, generic function approximations and knowledge extractions for stream type of data. The work in this dissertation demonstrates MIRGKL’s effectiveness in clustering and knowledge representation and shows promising results in diagnostics and prognostics applications

    Reliability Models and Failure Detection Algorithms for Wind Turbines

    Get PDF
    Durante las pasadas décadas, la industria eólica ha sufrido un crecimiento muysignificativo en Europa llevando a la generación eólica al puesto más relevanteen cuanto a producción energética mediante fuentes renovables. Sin embargo, siconsideramos los aspectos económicos, el sector eólico todavía no ha alcanzadoel nivel competitivo necesario para batir a los sistemas de generación de energíaconvencionales.Los costes principales en la explotación de parques eólicos se asignan a lasactividades relacionadas con la Operación y Mantenimiento (O&M). Esto se debeal hecho de que, en la actualidad, la Operación y Mantenimiento está basadaprincipalmente en acciones correctivas o preventivas. Por tanto, el uso de técnicaspredictivas podría reducir de forma significativa los costes relacionados con lasactividades de mantenimiento mejorando así los beneficios globales de la explotaciónde los parques eólicos.Aunque los beneficios del mantenimiento predictivo se consideran cada díamás importantes, existen todavía la necesidad de investigar y explorar dichastécnicas. Modelos de fiabilidad avanzados y algoritmos de predicción de fallospueden facilitar a los operadores la detección anticipada de fallos de componentesen los aerogeneradores y, en base a ello, adaptar sus estrategias de mantenimiento.Hasta la fecha, los modelos de fiabilidad de turbinas eólicas se basan, casiexclusivamente, en la edad de la turbina. Esto es así porque fueron desarrolladosoriginalmente para máquinas que trabajan en entornos ‘amigables’, por ejemplo, enel interior de naves industriales. Los aerogeneradores, al contrario, están expuestosa condiciones ambientales altamente variables y, por tanto, los modelos clásicosde fiabilidad no reflejan la realidad con suficiente precisión. Es necesario, portanto, desarrollar nuevos modelos de fiabilidad que sean capaces de reproducir el comportamiento de los fallos de las turbinas eólicas y sus componentes, teniendoen cuenta las condiciones meteorológicas y operacionales en su emplazamiento.La predicción de fallos se realiza habitualmente utilizando datos que se obtienendel sistema de Supervisión Control y Adquisición de Datos (SCADA) o de Sistemasde Monitorización de Condición (CMS). Cabe destacar que en turbinas eólicasmodernas conviven ambos tipos de sistemas y la fusión de ambas fuentes de datospuede mejorar significativamente la detección de fallos. Esta tesis pretende mejorarlas prácticas actuales de Operación y Mantenimiento mediante: (1) el desarrollo demodelos avanzados de fiabilidad y detección de fallos basados en datos que incluyanlas condiciones ambientales y operacionales existentes en los parques eólicos y (2)la aplicación de nuevos algoritmos de detección de fallos que usen las condicionesambientales y operacionales del emplazamiento, así como datos procedentes tantode sistemas SCADA como CMS. Estos dos objetivos se han dividido en cuatrotareas.En la primera tarea se ha realizado un análisis exhaustivo tanto de los fallosproducidos en un amplio conjunto de aerogeneradores (amplio en número de turbinasy en longitud de los registros) como de sus tiempos de parada asociados. De estaforma, se han visualizado los componentes que más fallan en función de la tecnologíadel aerogenerador, así como sus modos de fallo. Esta información es vital para eldesarrollo posterior de modelos de fiabilidad y mantenimiento.En segundo lugar, se han investigado las condiciones meteorológicas previasa sucesos con fallos de los principales componentes de los aerogeneradores. Seha desarrollado un entorno de aprendizaje basado en datos utilizando técnicas deagrupamiento ‘k-means clustering’ y reglas de asociación ‘a priori’. Este entorno escapaz de manejar grandes cantidades de datos proporcionando resultados útiles yfácilmente visualizables. Adicionalmente, se han aplicado algoritmos de detecciónde anomalías y patrones para encontrar cambios abruptos y patrones recurrentesen la serie temporal de la velocidad del viento en momentos previos a los fallosde los componentes principales de los aerogeneradores. En la tercera tarea, sepropone un nuevo modelo de fiabilidad que incorpora directamente las condicionesmeteorológicas registradas durante los dos meses previos al fallo. El modelo usados procesos estadísticos separados, uno genera los sucesos de fallos, así comoceros ocasionales mientras que el otro genera los ceros estructurales necesarios paralos algoritmos de cálculo. Los posibles efectos no observados (heterogeneidad) en el parque eólico se tienen en cuenta de forma adicional. Para evitar problemas de‘over-fitting’ y multicolinearidades, se utilizan sofisticadas técnicas de regularización.Finalmente, la capacidad del modelo se verifica usando datos históricos de fallosy lecturas meteorológicas obtenidas en los mástiles meteorológicos de los parqueseólicos.En la última tarea se han desarrollado algoritmos de predicción basados encondiciones meteorológicas y en datos operacionales y de vibraciones. Se ha‘entrenado’ una red de Bayes, para predecir los fallos de componentes en unparque eólico, basada fundamentalmente en las condiciones meteorológicas delemplazamiento. Posteriormente, se introduce una metodología para fusionar datosde vibraciones obtenidos del CMS con datos obtenidos del sistema SCADA, conel objetivo de analizar las relaciones entre ambas fuentes. Estos datos se hanutilizado para la predicción de fallos en el eje principal utilizando varios algoritmosde inteligencia artificial, ‘random forests’, ‘gradient boosting machines’, modelosgeneralizados lineales y redes neuronales artificiales. Además, se ha desarrolladouna herramienta para la evaluación on-line de los datos de vibraciones (CMS)denominada DAVE (‘Distance Based Automated Vibration Evaluation’).Los resultados de esta tesis demuestran que el comportamiento de los fallos delos componentes de aerogeneradores está altamente influenciado por las condicionesmeteorológicas del emplazamiento. El entorno de aprendizaje basado en datos escapaz de identificar las condiciones generales y temporales específicas previas alos fallos de componentes. Además, se ha demostrado que, con los modelos defiabilidad y algoritmos de detección propuestos, la Operación y Mantenimiento delas turbinas eólicas puede mejorarse significativamente. Estos modelos de fiabilidady de detección de fallos son los primeros que proporcionan una representaciónrealística y específica del emplazamiento, al considerar combinaciones complejasde las condiciones ambientales, así como indicadores operacionales y de estadode operación obtenidos a partir de la fusión de datos de vibraciones CMS y datosdel SCADA. Por tanto, este trabajo proporciona entornos prácticos, modelos yalgoritmos que se podrán aplicar en el campo del mantenimiento predictivo deturbinas eólicas.<br /

    Articulatory features for conversational speech recognition

    Get PDF

    Compensating hyperarticulation for automatic speech recognition

    Get PDF

    Statistical parametric speech synthesis based on sinusoidal models

    Get PDF
    This study focuses on improving the quality of statistical speech synthesis based on sinusoidal models. Vocoders play a crucial role during the parametrisation and reconstruction process, so we first lead an experimental comparison of a broad range of the leading vocoder types. Although our study shows that for analysis / synthesis, sinusoidal models with complex amplitudes can generate high quality of speech compared with source-filter ones, component sinusoids are correlated with each other, and the number of parameters is also high and varies in each frame, which constrains its application for statistical speech synthesis. Therefore, we first propose a perceptually based dynamic sinusoidal model (PDM) to decrease and fix the number of components typically used in the standard sinusoidal model. Then, in order to apply the proposed vocoder with an HMM-based speech synthesis system (HTS), two strategies for modelling sinusoidal parameters have been compared. In the first method (DIR parameterisation), features extracted from the fixed- and low-dimensional PDM are statistically modelled directly. In the second method (INT parameterisation), we convert both static amplitude and dynamic slope from all the harmonics of a signal, which we term the Harmonic Dynamic Model (HDM), to intermediate parameters (regularised cepstral coefficients (RDC)) for modelling. Our results show that HDM with intermediate parameters can generate comparable quality to STRAIGHT. As correlations between features in the dynamic model cannot be modelled satisfactorily by a typical HMM-based system with diagonal covariance, we have applied and tested a deep neural network (DNN) for modelling features from these two methods. To fully exploit DNN capabilities, we investigate ways to combine INT and DIR at the level of both DNN modelling and waveform generation. For DNN training, we propose to use multi-task learning to model cepstra (from INT) and log amplitudes (from DIR) as primary and secondary tasks. We conclude from our results that sinusoidal models are indeed highly suited for statistical parametric synthesis. The proposed method outperforms the state-of-the-art STRAIGHT-based equivalent when used in conjunction with DNNs. To further improve the voice quality, phase features generated from the proposed vocoder also need to be parameterised and integrated into statistical modelling. Here, an alternative statistical model referred to as the complex-valued neural network (CVNN), which treats complex coefficients as a whole, is proposed to model complex amplitude explicitly. A complex-valued back-propagation algorithm using a logarithmic minimisation criterion which includes both amplitude and phase errors is used as a learning rule. Three parameterisation methods are studied for mapping text to acoustic features: RDC / real-valued log amplitude, complex-valued amplitude with minimum phase and complex-valued amplitude with mixed phase. Our results show the potential of using CVNNs for modelling both real and complex-valued acoustic features. Overall, this thesis has established competitive alternative vocoders for speech parametrisation and reconstruction. The utilisation of proposed vocoders on various acoustic models (HMM / DNN / CVNN) clearly demonstrates that it is compelling to apply them for the parametric statistical speech synthesis

    Data Driven Mobility

    Get PDF
    corecore