15,292 research outputs found

    Efficient Embedded Speech Recognition for Very Large Vocabulary Mandarin Car-Navigation Systems

    Get PDF
    Automatic speech recognition (ASR) for a very large vocabulary of isolated words is a difficult task on a resource-limited embedded device. This paper presents a novel fast decoding algorithm for a Mandarin speech recognition system which can simultaneously process hundreds of thousands of items and maintain high recognition accuracy. The proposed algorithm constructs a semi-tree search network based on Mandarin pronunciation rules, to avoid duplicate syllable matching and save redundant memory. Based on a two-stage fixed-width beam-search baseline system, the algorithm employs a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce recognition time. This algorithm is aimed at an in-car navigation system in China and simulated on a standard PC workstation. The experimental results show that the proposed method reduces recognition time by nearly 6-fold and memory size nearly 2- fold compared to the baseline system, and causes less than 1% accuracy degradation for a 200,000 word recognition task

    Homogenous Ensemble Phonotactic Language Recognition Based on SVM Supervector Reconstruction

    Get PDF
    Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions

    Latent Class Model with Application to Speaker Diarization

    Get PDF
    In this paper, we apply a latent class model (LCM) to the task of speaker diarization. LCM is similar to Patrick Kenny's variational Bayes (VB) method in that it uses soft information and avoids premature hard decisions in its iterations. In contrast to the VB method, which is based on a generative model, LCM provides a framework allowing both generative and discriminative models. The discriminative property is realized through the use of i-vector (Ivec), probabilistic linear discriminative analysis (PLDA), and a support vector machine (SVM) in this work. Systems denoted as LCM-Ivec-PLDA, LCM-Ivec-SVM, and LCM-Ivec-Hybrid are introduced. In addition, three further improvements are applied to enhance its performance. 1) Adding neighbor windows to extract more speaker information for each short segment. 2) Using a hidden Markov model to avoid frequent speaker change points. 3) Using an agglomerative hierarchical cluster to do initialization and present hard and soft priors, in order to overcome the problem of initial sensitivity. Experiments on the National Institute of Standards and Technology Rich Transcription 2009 speaker diarization database, under the condition of a single distant microphone, show that the diarization error rate (DER) of the proposed methods has substantial relative improvements compared with mainstream systems. Compared to the VB method, the relative improvements of LCM-Ivec-PLDA, LCM-Ivec-SVM, and LCM-Ivec-Hybrid systems are 23.5%, 27.1%, and 43.0%, respectively. Experiments on our collected database, CALLHOME97, CALLHOME00 and SRE08 short2-summed trial conditions also show that the proposed LCM-Ivec-Hybrid system has the best overall performance

    RNN Language Model with Word Clustering and Class-based Output Layer

    Get PDF
    The recurrent neural network language model (RNNLM) has shown significant promise for statistical language modeling. In this work, a new class-based output layer method is introduced to further improve the RNNLM. In this method, word class information is incorporated into the output layer by utilizing the Brown clustering algorithm to estimate a class-based language model. Experimental results show that the new output layer with word clustering not only improves the convergence obviously but also reduces the perplexity and word error rate in large vocabulary continuous speech recognition

    Neutralization of chemokines RANTES and MIG increases virus antigen expression and spinal cord pathology during Theiler's virus infection.

    Get PDF
    The role of chemokines during some viral infections is unpredictable because the inflammatory response regulated by these molecules can have two, contrasting effects-viral immunity and immunopathologic injury to host tissues. Using Theiler's virus infection of SJL mice as a model of this type of disease, we have investigated the roles of two chemokines-regulated on activation, normal T cell-expressed and secreted (RANTES) chemokine and monokine induced by IFN-gamma (MIG)-by treating mice with antisera that block lymphocyte migration. Control, infected mice showed virus persistence, mild inflammation and a small degree of demyelination in the white matter of the spinal cord at 6 weeks post-infection. Treatment of mice with RANTES antiserum starting at 2 weeks post-infection increased both viral antigen expression and the severity of inflammatory demyelination at 6 weeks post-infection. MIG antiserum increased the spread of virus and the proportion of spinal cord white matter with demyelination. Overall, viral antigen levels correlated strongly with the extent of pathology. At the RNA level, high virus expression was associated with low IL-2 and high IL-10 levels, and RANTES antiserum decreased the IL-2/IL-10 ratio. Our results suggest that RANTES and MIG participate in an immune response that attempts to restrict viral expression while limiting immunopathology and that anti-chemokine treatment poses the risk of exacerbating both conditions in the long term

    Very Long Baseline Array Imaging of Type-2 Seyferts with Double-Peaked Narrow Emission Lines: Searches for Sub-kpc Dual AGNs and Jet-Powered Outflows

    Full text link
    This paper presents Very Long Baseline Array (VLBA) observations of 13 double-peaked [O III] emission-line type-2 Active Galactic Nuclei (AGNs) at redshifts 0.06 < z < 0.41 (with a median redshift of z~0.15) identified in the Sloan Digital Sky Survey. Such double-peaked emission-line objects may result from jets or outflows from the central engine or from a dual AGN. The VLBA provides an angular resolution of <~10 pc at the distance of many of these galaxies, sufficient to resolve the radio emission from extremely close dual AGNs and to contribute to understanding the origin of double-peaked [O III] emission lines. Of the 13 galaxies observed at 3.6 cm (8.4 GHz), we detect six at a 1\sigma\ sensitivity level of ~0.15 mJy/beam, two of which show clear jet structures on scales ranging from a few milliarcseconds to tens of milliarcseconds (corresponding to a few pc to tens of pc at a median redshift of 0.15). We suggest that radio-loud double-peaked emission-line type-2 AGNs may be indicative of jet produced structures, but a larger sample of double-peaked [O III] AGNs with high angular resolution radio observations will be required to confirm this suggestion.Comment: 14 pages, 7 figures; ApJ in pres

    Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition

    Get PDF
    The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a time-frequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches
    • …
    corecore