Search CORE

116 research outputs found

Full Covariance Modelling for Speech Recognition

Author: Bell Peter
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

HMM-based systems for Automatic Speech Recognition typically model the acoustic features using mixtures of multivariate Gaussians. In this thesis, we consider the problem of learning a suitable covariance matrix for each Gaussian. A variety of schemes have been proposed for controlling the number of covariance parameters per Gaussian, and studies have shown that in general, the greater the number of parameters used in the models, the better the recognition performance. We therefore investigate systems with full covariance Gaussians. However, in this case, the obvious choice of parameters – given by the sample covariance matrix – leads to matrices that are poorly-conditioned, and do not generalise well to unseen test data. The problem is particularly acute when the amount of training data is limited. We propose two solutions to this problem: firstly, we impose the requirement that each matrix should take the form of a Gaussian graphical model, and introduce a method for learning the parameters and the model structure simultaneously. Secondly, we explain how an alternative estimator, the shrinkage estimator, is preferable to the standard maximum likelihood estimator, and derive formulae for the optimal shrinkage intensity within the context of a Gaussian mixture model. We show how this relates to the use of a diagonal covariance smoothing prior. We compare the effectiveness of these techniques to standard methods on a phone recognition task where the quantity of training data is artificially constrained. We then investigate the performance of the shrinkage estimator on a large-vocabulary conversational telephone speech recognition task. Discriminative training techniques can be used to compensate for the invalidity of the model correctness assumption underpinning maximum likelihood estimation. On the large-vocabulary task, we use discriminative training of the full covariance models and diagonal priors to yield improved recognition performance

Edinburgh Research Archive

Full covariance Gaussian mixture models evaluation on GPU

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Crossref

A Modified EM Algorithm for Shrinkage Estimation in Multivariate Hidden Markov Models

Author: Manifavas Efstratios
Μανιφάβας Ευστράτιος
Publication venue
Publication date: 01/01/2023
Field of study

Τα κρυμμένα Μαρκοβιανά μοντέλα χρησιμοποιούνται σε ένα ευρύ πεδίο εφαρμογών, λόγω της κατασκευής τους που τα καθιστά μαθηματικώς διαχειρίσιμα και επιτρέπει τη χρήση αποτελεσματικών υπολογιστικών τεχνικών. ́Εχουν αναπτυχθεί μέθοδοι για την εκτίμηση των παραμέτρων του μοντέλου, όπως ο αλγόριθμος EM, αλλά και για την εύρεση των κρυμμένων καταστάσεων της Μαρκοβιανής αλυσίδας, όπως ο αλγόριθμος Viterbi. Σε εφαρμογές στις οποίες η διάσταση των δεδομένων είναι συγκρίσιμη με το μέγεθος του δέιγματος, είναι γνωστό πως ο δειγματικός πίνακας συνδιακύμανσης είναι αριθμητικά ασταθής, γεγονός που επηρεάζει άμεσα το βήμα μεγιστοποίησης (M-step) του αλγορίθμου EM, στο οποίο εμπλέκεται ο υπολογισμός του αντιστρόφου του. Το πρόβλημα αυτό μπορεί να ενταθεί λόγω ενδεχόμενης ύπαρξης καταστάσεων οι οποίες εμφανίζονται σπάνια, με αποτέλεσμα το μέγεθος δείγματος για την εκτίμηση των αντίστοιχων παραμέτρων να είναι μικρό. Επομένως, η άμεση χρήση αυτών των μεθόδων είναι πιθανό να οδηγήσει σε αριθμητικά προβ- λήματα, όσον αφορά στην εκτίμηση του πίνακα συνδιακύμανσης και του αντιστρόφου του, επηρεάζοντας επιπλέον την εκτίμηση του πίνακα πιθανοτήτων μετάβασης και την ανακατασκευή της κρυμμένης Μαρκο- βιανής αλυσίδας. Στη συγκεκριμένη εργασία μελετάται θεωρητικά και αλγοριθμικά μία τροποποίηση του αλγορίθμου EM, έτσι ώστε ο εκτιμήτης που προκύπτει για τον πίνακα συνδιακύμανσης, κατά το βήμα μεγιστοποίησης, να είναι αυτός που απορρέει από τη χρήση της μεθόδου συρρίκνωσης (shrinkage). Για τον σκοπό αυτό, στη συνάρτηση της λογαριθμικής πιθανοφάνειας ενσωματώνονται κάποιες ποινές, ώστε να κανονικοποιηθεί το αντίστοιχο πρόβλημα μεγιστοποίησης. Η συνάρτηση αυτή, χρησιμοποιείται και στο βήμα εκτίμησης (E-step). Επίσης, μελετάται αλγοριθμικά και μία παραλλαγή αυτής της μεθόδου, στην οποία η συνάρτηση με τις ποινές χρησιμοποιείται μόνο κατά το βήμα μεγιστοποίησης (M-step).Hidden Markov models are used in a wide range of applications due to their construction that renders them mathematically tractable and allows for the use of efficient computational techniques. There are methods for the estimation of the model’s parameters, such as the EM algorithm, but also for the estimation of the hidden states of the underlying Markov chain, such as the Viterbi algorithm. In applications where the dimension of the data is comparable to the sample size, the sample covariance matrix is known to be ill-conditioned, which directly affects the maximisation step (M- step) of the EM algorithm, where its inverse is involved in the computations. This problem might be amplified if there are rarely visited states resulting in a small sample size for the estimation of the corresponding parameters. Therefore, the direct implementation of these methods can be proved to be troublesome, as many computational problems might occur in the estimation of the covariance matrix and its inverse, further affecting the estimation of the one-step transition probability matrix and the reconstruction of the hidden Markov chain. In this paper, a modified version of the EM algorithm is studied, both theoretically and computa- tionally, in order to obtain the shrinkage estimator of the covariance matrix during the maximisation step. This is achieved by maximising a penalised log-likelihood function, which is also used in the estimation step (E-step). A variant of this modified version, where the penalised log-likelihood func- tion is only used in the maximisation step (M-step), is also studied computationally

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Recent advances in directional statistics

Author: García-Portugués Eduardo
Pewsey Arthur
Publication venue
Publication date: 22/09/2020
Field of study

Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

arXiv.org e-Print Archive

Crossref

Universidad Carlos III de Madrid e-Archivo

Recommended from our members

Real-time decoding of question-and-answer speech dialogue using human cortical activity.

Author: Chang Edward F
Leonard Matthew K
Makin Joseph G
Moses David A
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate

eScholarship - University of California

Speaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions

Author: Badin Pierre
Bailly Gérard
Elisei Frédéric
Hueber Thomas
Publication venue: HAL CCSD
Publication date: 26/08/2013
Field of study

International audienceThe article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation training. This system provides a speaker with visual information about his/her own articulation, via a 3D orofacial clone. In previous work, we proposed to use GMM-based voice conversion for speaker adaptation. Acoustic-articulatory mapping was achieved in 2 consecutive steps: 1) converting the spectral trajectories of the target speaker (i.e. the system user) into spectral trajectories of the reference speaker (voice conversion), and 2) estimating the most likely articulatory trajectories of the reference speaker from the converted spectral features (acoustic-articulatory inversion). In this work, we propose to combine these two steps into the same statistical mapping framework, by fusing multiple regressions based on trajectory GMM and maximum likelihood criterion (MLE). The proposed technique is compared to two standard speaker adaptation techniques based respectively on MAP and MLLR

Hal - Université Grenoble Alpes

Scanpath modeling and classification with Hidden Markov Models

Author: Chan AB
Coutrot A
Hsiao JHW
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2017
Field of study

How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release SMAC with HMM, a Matlab toolbox freely available to the community under an open-source license agreement.published_or_final_versio

UCL Discovery

HKU Scholars Hub

Multi-categories tool wear classification in micro-milling

Author: ZHU KUNPENG
Publication venue
Publication date: 08/10/2007
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Greedy Gaussian Segmentation of Multivariate Time Series

Author: Boyd Stephen
Hallac David
Nystrup Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/04/2018
Field of study

We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem can be solved using dynamic programming, with complexity that grows with the square of the time series length. We propose a heuristic method that approximately solves the problem in linear time with respect to this length, and always yields a locally optimal choice, in the sense that no change of any one breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose appropriate values of the two hyperparameters in the method. Finally, we illustrate our GGS approach on financial time series and Wikipedia text data

arXiv.org e-Print Archive

Online Research Database In Technology