96 research outputs found

    Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models

    Get PDF
    Common noise compensation techniques use vector Taylor series (VTS) to approximate the mismatch function. Recent work shows that the approximation accuracy may be improved by sampling. One such sampling technique is the unscented transform (UT), which draws samples deterministically from clean speech and noise model to derive the noise corrupted speech parameters. This paper applies UT to noise compensation of the subspace Gaussian mixture model (SGMM). Since UT requires relatively smaller number of samples for accurate estimation, it has significantly lower computational cost compared to other random sampling techniques. However, the number of surface Gaussians in an SGMM is typically very large, making the direct application of UT, for compensating individual Gaussian components, computationally impractical. In this paper, we avoid the computational burden by employing UT in the framework of joint uncertainty decoding (JUD), which groups all the Gaussian components into small number of classes, sharing the compensation parameters by class. We evaluate the JUD-UT technique for an SGMM system using the Aurora 4 corpus. Experimental results indicate that UT can lead to increased accuracy compared to VTS approximation if the JUD phase factor is untuned, and to similar accuracy if the phase factor is tuned empirically. 1

    A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

    Full text link
    This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules leading to a unified view on known derivations as well as to new formulations for certain approaches. The generic Bayesian perspective provided in this contribution thus highlights structural differences and similarities between the analyzed approaches

    Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models

    Get PDF
    Abstract—Joint uncertainty decoding (JUD) is a model-based noise compensation technique for conventional Gaussian Mixture Model (GMM) based speech recognition systems. Unlike vector Taylor series (VTS) compensation which operates on the individual Gaussian components in an acoustic model, JUD clusters the Gaussian components into a smaller number of classes, sharing the compensation parameters for the set of Gaussians in a given class. This significantly reduces the computational cost. In this paper, we investigate noise compensation for subspace Gaussian mixture model (SGMM) based speech recognition systems using JUD. The total number of Gaussian components in an SGMM is typically very large. Therefore direct compensation of the individual Gaussian components, as performed by VTS, is computationally expensive. In this paper we show that JUDbased noise compensation can be successfully applied to SGMMs in a computationally efficient way. We evaluate the JUD/SGMM technique on the standard Aurora 4 corpus. Our experimental results indicate that the JUD/SGMM system results in lower word error rates compared with a conventional GMM system with either VTS-based or JUD-based noise compensation. Index Terms—subspace Gaussian mixture model, vector Taylor series, joint uncertainty decoding, noise robust ASR, Aurora

    Temporally Varying Weight Regression for Speech Recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Subspace Gaussian mixture models for automatic speech recognition

    Get PDF
    In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of training data is required to fit the model. In addition, different sources of acoustic variability that impact the accuracy of a recogniser such as pronunciation variation, accent, speaker factor and environmental noise are only weakly modelled and factorized by adaptation techniques such as maximum likelihood linear regression (MLLR), maximum a posteriori adaptation (MAP) and vocal tract length normalisation (VTLN). In this thesis, we will discuss an alternative acoustic modelling approach — the subspace Gaussian mixture model (SGMM), which is expected to deal with these two issues better. In an SGMM, the model parameters are derived from low-dimensional model and speaker subspaces that can capture phonetic and speaker correlations. Given these subspaces, only a small number of state-dependent parameters are required to derive the corresponding GMMs. Hence, the total number of model parameters can be reduced, which allows acoustic modelling with a limited amount of training data. In addition, the SGMM-based acoustic model factorizes the phonetic and speaker factors and within this framework, other source of acoustic variability may also be explored. In this thesis, we propose a regularised model estimation for SGMMs, which avoids overtraining in case that the training data is sparse. We will also take advantage of the structure of SGMMs to explore cross-lingual acoustic modelling for low-resource speech recognition. Here, the model subspace is estimated from out-domain data and ported to the target language system. In this case, only the state-dependent parameters need to be estimated which relaxes the requirement of the amount of training data. To improve the robustness of SGMMs against environmental noise, we propose to apply the joint uncertainty decoding (JUD) technique that is shown to be efficient and effective. We will report experimental results on the Wall Street Journal (WSJ) database and GlobalPhone corpora to evaluate the regularisation and cross-lingual modelling of SGMMs. Noise compensation using JUD for SGMM acoustic models is evaluated on the Aurora 4 database

    Noise-Robust Speech Recognition Using Deep Neural Network

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Noise Compensation for Subspace Gaussian Mixture Models.

    Get PDF
    Joint uncertainty decoding (JUD) is an effective model-based noise compensation technique for conventional Gaussian mixture model (GMM) based speech recognition systems. In this paper, we apply JUD to subspace Gaussian mixture model (SGMM) based acoustic models. The total number of Gaussians in the SGMM acoustic model is usually much larger than for conventional GMMs, which limits the application of approaches which explicitly compensate each Gaussian, such as vector Taylor series (VTS). However, by clustering the Gaussian components into a number of regression classes, JUD-based noise compensation can be successfully applied to SGMM systems. We evaluate the JUD/SGMM technique using the Aurora 4 corpus, and the experimental results indicated that it is more accurate than conventional GMM-based systems using either VTS or JUD noise compensation. 1

    Computational modelling of the human heart and multiscale simulation of its electrophysiological activity aimed at the treatment of cardiac arrhythmias related to ischaemia and Infarction

    Full text link
    [ES] Las enfermedades cardiovasculares constituyen la principal causa de morbilidad y mortalidad a nivel mundial, causando en torno a 18 millones de muertes cada año. De entre ellas, la más común es la enfermedad isquémica cardíaca, habitualmente denominada como infarto de miocardio (IM). Tras superar un IM, un considerable número de pacientes desarrollan taquicardias ventriculares (TV) potencialmente mortales durante la fase crónica del IM, es decir, semanas, meses o incluso años después la fase aguda inicial. Este tipo concreto de TV normalmente se origina por una reentrada a través de canales de conducción (CC), filamentos de miocardio superviviente que atraviesan la cicatriz del infarto fibrosa y no conductora. Cuando los fármacos anti-arrítmicos resultan incapaces de evitar episodios recurrentes de TV, la ablación por radiofrecuencia (ARF), un procedimiento mínimamente invasivo realizado mediante cateterismo en el laboratorio de electrofisiología (EF), se usa habitualmente para interrumpir de manera permanente la propagación eléctrica a través de los CCs responsables de la TV. Sin embargo, además de ser invasivo, arriesgado y requerir mucho tiempo, en casos de TVs relacionadas con IM crónico, hasta un 50% de los pacientes continúa padeciendo episodios recurrentes de TV tras el procedimiento de ARF. Por tanto, existe la necesidad de desarrollar nuevas estrategias pre-procedimiento para mejorar la planificación de la ARF y, de ese modo, aumentar esta tasa de éxito relativamente baja. En primer lugar, realizamos una revisión exhaustiva de la literatura referente a los modelos cardiacos 3D existentes, con el fin de obtener un profundo conocimiento de sus principales características y los métodos usados en su construcción, con especial atención sobre los modelos orientados a simulación de EF cardíaca. Luego, usando datos clínicos de un paciente con historial de TV relacionada con infarto, diseñamos e implementamos una serie de estrategias y metodologías para (1) generar modelos computacionales 3D específicos de paciente de ventrículos infartados que puedan usarse para realizar simulaciones de EF cardíaca a nivel de órgano, incluyendo la cicatriz del infarto y la región circundante conocida como zona de borde (ZB); (2) construir modelos 3D de torso que permitan la obtención del ECG simulado; y (3) llevar a cabo estudios in-silico de EF personalizados y pre-procedimiento, tratando de replicar los verdaderos estudios de EF realizados en el laboratorio de EF antes de la ablación. La finalidad de estas metodologías es la de localizar los CCs en el modelo ventricular 3D para ayudar a definir los objetivos de ablación óptimos para el procedimiento de ARF. Por último, realizamos el estudio retrospectivo por simulación de un caso, en el que logramos inducir la TV reentrante relacionada con el infarto usando diferentes configuraciones de modelado para la ZB. Validamos nuestros resultados mediante la reproducción, con una precisión razonable, del ECG del paciente en TV, así como en ritmo sinusal a partir de los mapas de activación endocárdica obtenidos invasivamente mediante sistemas de mapeado electroanatómico en este último caso. Esto permitió encontrar la ubicación y analizar las características del CC responsable de la TV clínica. Cabe destacar que dicho estudio in-silico de EF podría haberse efectuado antes del procedimiento de ARF, puesto que nuestro planteamiento está completamente basado en datos clínicos no invasivos adquiridos antes de la intervención real. Estos resultados confirman la viabilidad de la realización de estudios in-silico de EF personalizados y pre-procedimiento de utilidad, así como el potencial del abordaje propuesto para llegar a ser en un futuro una herramienta de apoyo para la planificación de la ARF en casos de TVs reentrantes relacionadas con infarto. No obstante, la metodología propuesta requiere de notables mejoras y validación por medio de es[CA] Les malalties cardiovasculars constitueixen la principal causa de morbiditat i mortalitat a nivell mundial, causant entorn a 18 milions de morts cada any. De elles, la més comuna és la malaltia isquèmica cardíaca, habitualment denominada infart de miocardi (IM). Després de superar un IM, un considerable nombre de pacients desenvolupen taquicàrdies ventriculars (TV) potencialment mortals durant la fase crònica de l'IM, és a dir, setmanes, mesos i fins i tot anys després de la fase aguda inicial. Aquest tipus concret de TV normalment s'origina per una reentrada a través dels canals de conducció (CC), filaments de miocardi supervivent que travessen la cicatriu de l'infart fibrosa i no conductora. Quan els fàrmacs anti-arítmics resulten incapaços d'evitar episodis recurrents de TV, l'ablació per radiofreqüència (ARF), un procediment mínimament invasiu realitzat mitjançant cateterisme en el laboratori de electrofisiologia (EF), s'usa habitualment per a interrompre de manera permanent la propagació elèctrica a través dels CCs responsables de la TV. No obstant això, a més de ser invasiu, arriscat i requerir molt de temps, en casos de TVs relacionades amb IM crònic fins a un 50% dels pacients continua patint episodis recurrents de TV després del procediment d'ARF. Per tant, existeix la necessitat de desenvolupar noves estratègies pre-procediment per a millorar la planificació de l'ARF i, d'aquesta manera, augmentar la taxa d'èxit, que es relativament baixa. En primer lloc, realitzem una revisió exhaustiva de la literatura referent als models cardíacs 3D existents, amb la finalitat d'obtindre un profund coneixement de les seues principals característiques i els mètodes usats en la seua construcció, amb especial atenció sobre els models orientats a simulació de EF cardíaca. Posteriorment, usant dades clíniques d'un pacient amb historial de TV relacionada amb infart, dissenyem i implementem una sèrie d'estratègies i metodologies per a (1) generar models computacionals 3D específics de pacient de ventricles infartats capaços de realitzar simulacions de EF cardíaca a nivell d'òrgan, incloent la cicatriu de l'infart i la regió circumdant coneguda com a zona de vora (ZV); (2) construir models 3D de tors que permeten l'obtenció del ECG simulat; i (3) dur a terme estudis in-silico de EF personalitzats i pre-procediment, tractant de replicar els vertaders estudis de EF realitzats en el laboratori de EF abans de l'ablació. La finalitat d'aquestes metodologies és la de localitzar els CCs en el model ventricular 3D per a ajudar a definir els objectius d'ablació òptims per al procediment d'ARF. Finalment, a manera de prova de concepte, realitzem l'estudi retrospectiu per simulació d'un cas, en el qual aconseguim induir la TV reentrant relacionada amb l'infart usant diferents configuracions de modelatge per a la ZV. Validem els nostres resultats mitjançant la reproducció, amb una precisió raonable, del ECG del pacient en TV, així com en ritme sinusal a partir dels mapes d'activació endocardíac obtinguts invasivament mitjançant sistemes de mapatge electro-anatòmic en aquest últim cas. Això va permetre trobar la ubicació i analitzar les característiques del CC responsable de la TV clínica. Cal destacar que aquest estudi in-silico de EF podria haver-se efectuat abans del procediment d'ARF, ja que el nostre plantejament està completament basat en dades clíniques no invasius adquirits abans de la intervenció real. Aquests resultats confirmen la viabilitat de la realització d'estudis in-silico de EF personalitzats i pre-procediment d'utilitat, així com el potencial de l'abordatge proposat per a arribar a ser en un futur una eina de suport per a la planificació de l'ARF en casos de TVs reentrants relacionades amb infart. No obstant això, la metodologia proposada requereix de notables millores i validació per mitjà d'estudis de simulació amb grans cohorts de pacients.[EN] Cardiovascular diseases represent the main cause of morbidity and mortality worldwide, causing around 18 million deaths every year. Among these diseases, the most common one is the ischaemic heart disease, usually referred to as myocardial infarction (MI). After surviving to a MI, a considerable number of patients develop life-threatening ventricular tachycardias (VT) during the chronic stage of the MI, that is, weeks, months or even years after the initial acute phase. This particular type of VT is typically sustained by reentry through slow conducting channels (CC), which are filaments of surviving myocardium that cross the non-conducting fibrotic infarct scar. When anti-arrhythmic drugs are unable to prevent recurrent VT episodes, radiofrequency ablation (RFA), a minimally invasive procedure performed by catheterization in the electrophysiology (EP) laboratory, is commonly used to interrupt the electrical conduction through the CCs responsible for the VT permanently. However, besides being invasive, risky and time-consuming, in the cases of VTs related to chronic MI, up to 50% of patients continue suffering from recurrent VT episodes after the RFA procedure. Therefore, there exists a need to develop novel pre-procedural strategies to improve RFA planning and, thereby, increase this relatively low success rate. First, we conducted an exhaustive review of the literature associated with the existing 3D cardiac models in order to gain a deep knowledge about their main features and the methods used for their construction, with special focus on those models oriented to simulation of cardiac EP. Later, using a clinical dataset of a chronically infarcted patient with a history of infarct-related VT, we designed and implemented a number of strategies and methodologies to (1) build patient-specific 3D computational models of infarcted ventricles that can be used to perform simulations of cardiac EP at the organ level, including the infarct scar and the surrounding region known as border zone (BZ); (2) construct 3D torso models that enable to compute the simulated ECG; and (3) carry out pre-procedural personalized in-silico EP studies, trying to replicate the actual EP studies conducted in the EP laboratory prior to the ablation. The goal of these methodologies is to allow locating the CCs into the 3D ventricular model in order to help in defining the optimal ablation targets for the RFA procedure. Lastly, as a proof-of-concept, we performed a retrospective simulation case study, in which we were able to induce an infarct-related reentrant VT using different modelling configurations for the BZ. We validated our results by reproducing with a reasonable accuracy the patient's ECG during VT, as well as in sinus rhythm from the endocardial activation maps invasively recorded via electroanatomical mapping systems in this latter case. This allowed us to find the location and analyse the features of the CC responsible for the clinical VT. Importantly, such in-silico EP study might have been conducted prior to the RFA procedure, since our approach is completely based on non-invasive clinical data acquired before the real intervention. These results confirm the feasibility of performing useful pre-procedural personalized in-silico EP studies, as well as the potential of the proposed approach to become a helpful tool for RFA planning in cases of infarct-related reentrant VTs in the future. Nevertheless, the developed methodology requires further improvements and validation by means of simulation studies including large cohorts of patients.During the carrying out of this doctoral thesis, the author Alejandro Daniel López Pérez was financially supported by the Ministerio de Economía, Industria y Competitividad of Spain through the program Ayudas para contratos predoctorales para la formación de doctores, with the grant number BES-2013-064089.López Pérez, AD. (2019). Computational modelling of the human heart and multiscale simulation of its electrophysiological activity aimed at the treatment of cardiac arrhythmias related to ischaemia and Infarction [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/124973TESI
    corecore