116 research outputs found

    Full Covariance Modelling for Speech Recognition

    Get PDF
    HMM-based systems for Automatic Speech Recognition typically model the acoustic features using mixtures of multivariate Gaussians. In this thesis, we consider the problem of learning a suitable covariance matrix for each Gaussian. A variety of schemes have been proposed for controlling the number of covariance parameters per Gaussian, and studies have shown that in general, the greater the number of parameters used in the models, the better the recognition performance. We therefore investigate systems with full covariance Gaussians. However, in this case, the obvious choice of parameters – given by the sample covariance matrix – leads to matrices that are poorly-conditioned, and do not generalise well to unseen test data. The problem is particularly acute when the amount of training data is limited. We propose two solutions to this problem: firstly, we impose the requirement that each matrix should take the form of a Gaussian graphical model, and introduce a method for learning the parameters and the model structure simultaneously. Secondly, we explain how an alternative estimator, the shrinkage estimator, is preferable to the standard maximum likelihood estimator, and derive formulae for the optimal shrinkage intensity within the context of a Gaussian mixture model. We show how this relates to the use of a diagonal covariance smoothing prior. We compare the effectiveness of these techniques to standard methods on a phone recognition task where the quantity of training data is artificially constrained. We then investigate the performance of the shrinkage estimator on a large-vocabulary conversational telephone speech recognition task. Discriminative training techniques can be used to compensate for the invalidity of the model correctness assumption underpinning maximum likelihood estimation. On the large-vocabulary task, we use discriminative training of the full covariance models and diagonal priors to yield improved recognition performance

    Full covariance Gaussian mixture models evaluation on GPU

    Full text link

    A Modified EM Algorithm for Shrinkage Estimation in Multivariate Hidden Markov Models

    Get PDF
    Τα κρυμμένα Μαρκοβιανά μοντέλα χρησιμοποιούνται σε ένα ευρύ πεδίο εφαρμογών, λόγω της κατασκευής τους που τα καθιστά μαθηματικώς διαχειρίσιμα και επιτρέπει τη χρήση αποτελεσματικών υπολογιστικών τεχνικών. ́Εχουν αναπτυχθεί μέθοδοι για την εκτίμηση των παραμέτρων του μοντέλου, όπως ο αλγόριθμος EM, αλλά και για την εύρεση των κρυμμένων καταστάσεων της Μαρκοβιανής αλυσίδας, όπως ο αλγόριθμος Viterbi. Σε εφαρμογές στις οποίες η διάσταση των δεδομένων είναι συγκρίσιμη με το μέγεθος του δέιγματος, είναι γνωστό πως ο δειγματικός πίνακας συνδιακύμανσης είναι αριθμητικά ασταθής, γεγονός που επηρεάζει άμεσα το βήμα μεγιστοποίησης (M-step) του αλγορίθμου EM, στο οποίο εμπλέκεται ο υπολογισμός του αντιστρόφου του. Το πρόβλημα αυτό μπορεί να ενταθεί λόγω ενδεχόμενης ύπαρξης καταστάσεων οι οποίες εμφανίζονται σπάνια, με αποτέλεσμα το μέγεθος δείγματος για την εκτίμηση των αντίστοιχων παραμέτρων να είναι μικρό. Επομένως, η άμεση χρήση αυτών των μεθόδων είναι πιθανό να οδηγήσει σε αριθμητικά προβ- λήματα, όσον αφορά στην εκτίμηση του πίνακα συνδιακύμανσης και του αντιστρόφου του, επηρεάζοντας επιπλέον την εκτίμηση του πίνακα πιθανοτήτων μετάβασης και την ανακατασκευή της κρυμμένης Μαρκο- βιανής αλυσίδας. Στη συγκεκριμένη εργασία μελετάται θεωρητικά και αλγοριθμικά μία τροποποίηση του αλγορίθμου EM, έτσι ώστε ο εκτιμήτης που προκύπτει για τον πίνακα συνδιακύμανσης, κατά το βήμα μεγιστοποίησης, να είναι αυτός που απορρέει από τη χρήση της μεθόδου συρρίκνωσης (shrinkage). Για τον σκοπό αυτό, στη συνάρτηση της λογαριθμικής πιθανοφάνειας ενσωματώνονται κάποιες ποινές, ώστε να κανονικοποιηθεί το αντίστοιχο πρόβλημα μεγιστοποίησης. Η συνάρτηση αυτή, χρησιμοποιείται και στο βήμα εκτίμησης (E-step). Επίσης, μελετάται αλγοριθμικά και μία παραλλαγή αυτής της μεθόδου, στην οποία η συνάρτηση με τις ποινές χρησιμοποιείται μόνο κατά το βήμα μεγιστοποίησης (M-step).Hidden Markov models are used in a wide range of applications due to their construction that renders them mathematically tractable and allows for the use of efficient computational techniques. There are methods for the estimation of the model’s parameters, such as the EM algorithm, but also for the estimation of the hidden states of the underlying Markov chain, such as the Viterbi algorithm. In applications where the dimension of the data is comparable to the sample size, the sample covariance matrix is known to be ill-conditioned, which directly affects the maximisation step (M- step) of the EM algorithm, where its inverse is involved in the computations. This problem might be amplified if there are rarely visited states resulting in a small sample size for the estimation of the corresponding parameters. Therefore, the direct implementation of these methods can be proved to be troublesome, as many computational problems might occur in the estimation of the covariance matrix and its inverse, further affecting the estimation of the one-step transition probability matrix and the reconstruction of the hidden Markov chain. In this paper, a modified version of the EM algorithm is studied, both theoretically and computa- tionally, in order to obtain the shrinkage estimator of the covariance matrix during the maximisation step. This is achieved by maximising a penalised log-likelihood function, which is also used in the estimation step (E-step). A variant of this modified version, where the penalised log-likelihood func- tion is only used in the maximisation step (M-step), is also studied computationally

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Speaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions

    No full text
    International audienceThe article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation training. This system provides a speaker with visual information about his/her own articulation, via a 3D orofacial clone. In previous work, we proposed to use GMM-based voice conversion for speaker adaptation. Acoustic-articulatory mapping was achieved in 2 consecutive steps: 1) converting the spectral trajectories of the target speaker (i.e. the system user) into spectral trajectories of the reference speaker (voice conversion), and 2) estimating the most likely articulatory trajectories of the reference speaker from the converted spectral features (acoustic-articulatory inversion). In this work, we propose to combine these two steps into the same statistical mapping framework, by fusing multiple regressions based on trajectory GMM and maximum likelihood criterion (MLE). The proposed technique is compared to two standard speaker adaptation techniques based respectively on MAP and MLLR

    Scanpath modeling and classification with Hidden Markov Models

    Get PDF
    How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release SMAC with HMM, a Matlab toolbox freely available to the community under an open-source license agreement.published_or_final_versio

    Multi-categories tool wear classification in micro-milling

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Greedy Gaussian Segmentation of Multivariate Time Series

    Get PDF
    We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem can be solved using dynamic programming, with complexity that grows with the square of the time series length. We propose a heuristic method that approximately solves the problem in linear time with respect to this length, and always yields a locally optimal choice, in the sense that no change of any one breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose appropriate values of the two hyperparameters in the method. Finally, we illustrate our GGS approach on financial time series and Wikipedia text data
    corecore