39 research outputs found

    Audio-visual fine-tuning of audio-only ASR models

    Full text link
    Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data. Recently, audio-visual self-supervised learning (SSL) approaches have been developed to reduce this dependence on transcribed AV data, but these methods are quite complex and computationally expensive. In this work, we propose replacing these expensive AV-SSL methods with a simple and fast \textit{audio-only} SSL method, and then performing AV supervised fine-tuning. We show that this approach is competitive with state-of-the-art (SOTA) AV-SSL methods on the LRS3-TED benchmark task (within 0.5% absolute WER), while being dramatically simpler and more efficient (12-30x faster to pre-train). Furthermore, we show we can extend this approach to convert a SOTA audio-only ASR model into an AV model. By doing so, we match SOTA AV-SSL results, even though no AV data was used during pre-training

    A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

    Get PDF
    We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition. Measuring perplexity and frame-level classification accuracy, kernel-based acoustic models are as effective as their DNN counterparts. However, on token-error-rates DNN models can be significantly better. We have discovered that this might be attributed to DNN's unique strength in reducing both the perplexity and the entropy of the predicted posterior probabilities. Motivated by our findings, we propose a new technique, entropy regularized perplexity, for model selection. This technique can noticeably improve the recognition performance of both types of models, and reduces the gap between them. While effective on Broadcast News, this technique could be also applicable to other tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400

    Distributed Content Curation on the Web

    Full text link

    Kernel Approximation Methods for Speech Recognition

    Get PDF
    International audienceWe study the performance of kernel methods on the acoustic modeling task for automatic speech recognition, and compare their performance to deep neural networks (DNNs). To scale the kernel methods to large data sets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, we propose a simple but effective feature selection method which reduces the number of random features required to attain a fixed level of performance. Second, we present a number of metrics which correlate strongly with speech recognition performance when computed on the heldout set; we attain improved performance by using these metrics to decide when to stop training. Additionally, we show that the linear bottleneck method of Sainath et al. (2013a) improves the performance of our kernel models significantly, in addition to speeding up training and making the models more compact. Leveraging these three methods, the kernel methods attain token error rates between 0.5% better and 0.1% worse than fully-connected DNNs across four speech recognition data sets, including the TIMIT and Broadcast News benchmark tasks

    Mechanical and Assembly Units of Viral Capsids Identified via Quasi-Rigid Domain Decomposition

    Get PDF
    Key steps in a viral life-cycle, such as self-assembly of a protective protein container or in some cases also subsequent maturation events, are governed by the interplay of physico-chemical mechanisms involving various spatial and temporal scales. These salient aspects of a viral life cycle are hence well described and rationalised from a mesoscopic perspective. Accordingly, various experimental and computational efforts have been directed towards identifying the fundamental building blocks that are instrumental for the mechanical response, or constitute the assembly units, of a few specific viral shells. Motivated by these earlier studies we introduce and apply a general and efficient computational scheme for identifying the stable domains of a given viral capsid. The method is based on elastic network models and quasi-rigid domain decomposition. It is first applied to a heterogeneous set of well-characterized viruses (CCMV, MS2, STNV, STMV) for which the known mechanical or assembly domains are correctly identified. The validated method is next applied to other viral particles such as L-A, Pariacoto and polyoma viruses, whose fundamental functional domains are still unknown or debated and for which we formulate verifiable predictions. The numerical code implementing the domain decomposition strategy is made freely available
    corecore