15,227 research outputs found

    Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

    Get PDF
    Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve

    Speaker segmentation and clustering

    Get PDF
    This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    An Unsupervised Approach for Automatic Activity Recognition based on Hidden Markov Model Regression

    Full text link
    Using supervised machine learning approaches to recognize human activities from on-body wearable accelerometers generally requires a large amount of labelled data. When ground truth information is not available, too expensive, time consuming or difficult to collect, one has to rely on unsupervised approaches. This paper presents a new unsupervised approach for human activity recognition from raw acceleration data measured using inertial wearable sensors. The proposed method is based upon joint segmentation of multidimensional time series using a Hidden Markov Model (HMM) in a multiple regression context. The model is learned in an unsupervised framework using the Expectation-Maximization (EM) algorithm where no activity labels are needed. The proposed method takes into account the sequential appearance of the data. It is therefore adapted for the temporal acceleration data to accurately detect the activities. It allows both segmentation and classification of the human activities. Experimental results are provided to demonstrate the efficiency of the proposed approach with respect to standard supervised and unsupervised classification approache

    Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

    Full text link
    Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. In this paper, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we developed a novel double articulation analyzer. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. The novel unsupervised double articulation analyzer is called NPB-DAA. The NPB-DAA can automatically estimate double articulation structure embedded in speech signals. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on Autonomous Mental Development (TAMD

    Joint segmentation of multivariate time series with hidden process regression for human activity recognition

    Full text link
    The problem of human activity recognition is central for understanding and predicting the human behavior, in particular in a prospective of assistive services to humans, such as health monitoring, well being, security, etc. There is therefore a growing need to build accurate models which can take into account the variability of the human activities over time (dynamic models) rather than static ones which can have some limitations in such a dynamic context. In this paper, the problem of activity recognition is analyzed through the segmentation of the multidimensional time series of the acceleration data measured in the 3-d space using body-worn accelerometers. The proposed model for automatic temporal segmentation is a specific statistical latent process model which assumes that the observed acceleration sequence is governed by sequence of hidden (unobserved) activities. More specifically, the proposed approach is based on a specific multiple regression model incorporating a hidden discrete logistic process which governs the switching from one activity to another over time. The model is learned in an unsupervised context by maximizing the observed-data log-likelihood via a dedicated expectation-maximization (EM) algorithm. We applied it on a real-world automatic human activity recognition problem and its performance was assessed by performing comparisons with alternative approaches, including well-known supervised static classifiers and the standard hidden Markov model (HMM). The obtained results are very encouraging and show that the proposed approach is quite competitive even it works in an entirely unsupervised way and does not requires a feature extraction preprocessing step
    • …
    corecore