Search CORE

1,987 research outputs found

Clustering hidden Markov models with variational HEM

Author: Chan Antoni B.
Coviello Emanuele
Lanckriet Gert R. G.
Publication venue
Publication date: 24/10/2012
Field of study

The hidden Markov model (HMM) is a widely-used generative model that copes with sequential data, assuming that each observation is conditioned on the state of a hidden Markov chain. In this paper, we derive a novel algorithm to cluster HMMs based on the hierarchical EM (HEM) algorithm. The proposed algorithm i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a "cluster center", i.e., a novel HMM that is representative for the group, in a manner that is consistent with the underlying generative model of the HMM. To cope with intractable inference in the E-step, the HEM algorithm is formulated as a variational optimization problem, and efficiently solved for the HMM case by leveraging an appropriate variational approximation. The benefits of the proposed algorithm, which we call variational HEM (VHEM), are demonstrated on several tasks involving time-series data, such as hierarchical clustering of motion capture sequences, and automatic annotation and retrieval of music and of online hand-writing data, showing improvements over current methods. In particular, our variational HEM algorithm effectively leverages large amounts of data when learning annotation models by using an efficient hierarchical estimation procedure, which reduces learning times and memory requirements, while improving model robustness through better regularization.Comment: 44 page

arXiv.org e-Print Archive

Parsimonious HMMs for Offline Handwritten Chinese Text Recognition

Author: Du Jun
Wang Wenchao
Wang Zi-Rui
Publication venue
Publication date: 13/08/2018
Field of study

Recently, hidden Markov models (HMMs) have achieved promising results for offline handwritten Chinese text recognition. However, due to the large vocabulary of Chinese characters with each modeled by a uniform and fixed number of hidden states, a high demand of memory and computation is required. In this study, to address this issue, we present parsimonious HMMs via the state tying which can fully utilize the similarities among different Chinese characters. Two-step algorithm with the data-driven question-set is adopted to generate the tied-state pool using the likelihood measure. The proposed parsimonious HMMs with both Gaussian mixture models (GMMs) and deep neural networks (DNNs) as the emission distributions not only lead to a compact model but also improve the recognition accuracy via the data sharing for the tied states and the confusion decreasing among state classes. Tested on ICDAR-2013 competition database, in the best configured case, the new parsimonious DNN-HMM can yield a relative character error rate (CER) reduction of 6.2%, 25% reduction of model size and 60% reduction of decoding time over the conventional DNN-HMM. In the compact setting case of average 1-state HMM, our parsimonious DNN-HMM significantly outperforms the conventional DNN-HMM with a relative CER reduction of 35.5%.Comment: Accepted by ICFHR201

arXiv.org e-Print Archive

Inference of Dynamic Regimes in the Microbiome

Author: Holmes Susan P.
Sankaran Kris
Publication venue
Publication date: 30/11/2017
Field of study

Many studies have been performed to characterize the dynamics and stability of the microbiome across a range of environmental contexts [Costello et al., 2012, Faust et al., 2015]. For example, it is often of interest to identify time intervals within which certain subsets of taxa have an interesting pattern of behavior. Viewed abstractly, these problems often have a flavor not just of time series modeling but also of regime detection, a problem with a rich history across a variety of applications, including speech recognition [Fox et al., 2011], finance [Lee, 2009], EEG analysis [Camilleri et al., 2014], and geophysics [Weatherley and Mora, 2002]. In spite of the parallels, regime detection methods are rarely used in microbiome analysis, most likely due to the fact that references for these methods are scattered across several literatures, descriptions are inaccessible outside limited research communities, and implementations are difficult to come across. We distill the core ideas of different regime detection methods, provide example applications, and share reproducible code, making these techniques more accessible to microbiome researchers. We re-analyze data of Dethlefsen and Relman [2011], a study of the effects of antibiotics on the microbiome, using Classification and Regression Trees (CART) [Breiman et al., 1984], Hidden Markov Models (HMMs) [Rabiner and Juang, 1986], Bayesian nonparametric HMMs [Teh and Jordan, 2010, Fox et al., 2008], mixtures of Gaussian Processes (GPs) [Rasmussen and Ghahramani, 2002], switching dynamical systems [Linderman et al., 2016], and multiple changepoint detection [Fan and Mackey, 2015]. Along the way, we summarize each method, their relevance to the microbiome, and tradeoffs associated with using them. Ultimately, our goal is to describe types of temporal or regime switching structure that can be incorporated into studies of microbiome dynamics

arXiv.org e-Print Archive

Tech Report A Variational HEM Algorithm for Clustering Hidden Markov Models

Author: Chan Antoni B.
Coviello Emanuele
Lanckriet Gert R. G.
Publication venue
Publication date: 05/09/2011
Field of study

The hidden Markov model (HMM) is a generative model that treats sequential data under the assumption that each observation is conditioned on the state of a discrete hidden variable that evolves in time as a Markov chain. In this paper, we derive a novel algorithm to cluster HMMs through their probability distributions. We propose a hierarchical EM algorithm that i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a "cluster center", i.e., a novel HMM that is representative for the group. We present several empirical studies that illustrate the benefits of the proposed algorithm.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

Statistical Modeling in Continuous Speech Recognition (CSR)(Invited Talk)

Author: Young Steve
Publication venue
Publication date: 10/01/2013
Field of study

Automatic continuous speech recognition (CSR) is sufficiently mature that a variety of real world applications are now possible including large vocabulary transcription and interactive spoken dialogues. This paper reviews the evolution of the statistical modelling techniques which underlie current-day systems, specifically hidden Markov models (HMMs) and N-grams. Starting from a description of the speech signal and its parameterisation, the various modelling assumptions and their consequences are discussed. It then describes various techniques by which the effects of these assumptions can be mitigated. Despite the progress that has been made, the limitations of current modelling techniques are still evident. The paper therefore concludes with a brief review of some of the more fundamental modelling work now in progress.Comment: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001

arXiv.org e-Print Archive

Speech Recognition by Machine, A Review

Author: Anusuya M. A.
Katti S. K.
Publication venue
Publication date: 13/01/2010
Field of study

This paper presents a brief survey on Automatic Speech Recognition and discusses the major themes and advances made in the past 60 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. After years of research and development the accuracy of automatic speech recognition remains one of the important research challenges (e.g., variations of the context, speakers, and environment).The design of Speech Recognition system requires careful attentions to the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database and performance evaluation. The problems that are existing in ASR and the various techniques to solve these problems constructed by various research workers have been presented in a chronological order. Hence authors hope that this work shall be a contribution in the area of speech recognition. The objective of this review paper is to summarize and compare some of the well known methods used in various stages of speech recognition system and identify research topic and applications which are at the forefront of this exciting and challenging field.Comment: 25 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS December 2009, ISSN 1947 5500, http://sites.google.com/site/ijcsis

arXiv.org e-Print Archive

Unsupervised Discovery of Linguistic Structure Including Two-level Acoustic Patterns Using Three Cascaded Stages of Iterative Optimization

Author: Chan Chun-an
Chung Cheng-Tao
Lee Lin-shan
Publication venue
Publication date: 07/09/2015
Field of study

Techniques for unsupervised discovery of acoustic patterns are getting increasingly attractive, because huge quantities of speech data are becoming available but manual annotations remain hard to acquire. In this paper, we propose an approach for unsupervised discovery of linguistic structure for the target spoken language given raw speech data. This linguistic structure includes two-level (subword-like and word-like) acoustic patterns, the lexicon of word-like patterns in terms of subword-like patterns and the N-gram language model based on word-like patterns. All patterns, models, and parameters can be automatically learned from the unlabelled speech corpus. This is achieved by an initialization step followed by three cascaded stages for acoustic, linguistic, and lexical iterative optimization. The lexicon of word-like patterns defines allowed consecutive sequence of HMMs for subword-like patterns. In each iteration, model training and decoding produces updated labels from which the lexicon and HMMs can be further updated. In this way, model parameters and decoded labels are respectively optimized in each iteration, and the knowledge about the linguistic structure is learned gradually layer after layer. The proposed approach was tested in preliminary experiments on a corpus of Mandarin broadcast news, including a task of spoken term detection with performance compared to a parallel test using models trained in a supervised way. Results show that the proposed system not only yields reasonable performance on its own, but is also complimentary to existing large vocabulary ASR systems.Comment: Accepted by ICASSP 201

arXiv.org e-Print Archive

Survey on Incremental Approaches for Network Anomaly Detection

Author: Bhattacharyya D. K.
Bhuyan Monowar H.
Kalita J. K.
Publication venue
Publication date: 19/11/2012
Field of study

As the communication industry has connected distant corners of the globe using advances in network technology, intruders or attackers have also increased attacks on networking infrastructure commensurately. System administrators can attempt to prevent such attacks using intrusion detection tools and systems. There are many commercially available signature-based Intrusion Detection Systems (IDSs). However, most IDSs lack the capability to detect novel or previously unknown attacks. A special type of IDSs, called Anomaly Detection Systems, develop models based on normal system or network behavior, with the goal of detecting both known and unknown attacks. Anomaly detection systems face many problems including high rate of false alarm, ability to work in online mode, and scalability. This paper presents a selective survey of incremental approaches for detecting anomaly in normal system or network traffic. The technological trends, open problems, and challenges over anomaly detection using incremental approach are also discussed.Comment: 14 pages, 1 figure, 11 tables referred journal publicatio

arXiv.org e-Print Archive

Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection

Author: Chung Cheng-Tao
Lee Lin-Shan
Publication venue
Publication date: 28/11/2017
Field of study

In this paper, we compare two paradigms for unsupervised discovery of structured acoustic tokens directly from speech corpora without any human annotation. The Multigranular Paradigm seeks to capture all available information in the corpora with multiple sets of tokens for different model granularities. The Hierarchical Paradigm attempts to jointly learn several levels of signal representations in a hierarchical structure. The two paradigms are unified within a theoretical framework in this paper. Query-by-Example Spoken Term Detection (QbE-STD) experiments on the QUESST dataset of MediaEval 2015 verifies the competitiveness of the acoustic tokens. The Enhanced Relevance Score (ERS) proposed in this work improves both paradigms for the task of QbE-STD. We also list results on the ABX evaluation task of the Zero Resource Challenge 2015 for comparison of the Paradigms

arXiv.org e-Print Archive

Maximum a Posteriori Adaptation of Network Parameters in Deep Models

Author: Chen I-Fan
Huang Zhen
Lee Chin-Hui
Siniscalchi Sabato Marco
Wu Jiadong
Publication venue
Publication date: 12/08/2015
Field of study

We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance. Given an abundance of DNN parameters but with only a limited amount of data, the effectiveness of the adapted DNN model can often be compromised. We formulate maximum a posteriori (MAP) adaptation of parameters of a specially designed CD-DNN-HMM with an augmented linear hidden networks connected to the output tied states, or senones, and compare it to feature space MAP linear regression previously proposed. Experimental evidences on the 20,000-word open vocabulary Wall Street Journal task demonstrate the feasibility of the proposed framework. In supervised adaptation, the proposed MAP adaptation approach provides more than 10% relative error reduction and consistently outperforms the conventional transformation based methods. Furthermore, we present an initial attempt to generate hierarchical priors to improve adaptation efficiency and effectiveness with limited adaptation data by exploiting similarities among senones

arXiv.org e-Print Archive