3,803 research outputs found

    Source-side context-informed hypothesis alignment for combining outputs from machine translation systems

    Get PDF
    This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. Traditional hypothesis alignment algorithms such as TER, HMM and IHMM do not directly utilise the context information of the source side but rather address the alignment issues via the output data itself. In this paper, a source-side context-informed (SSCI) hypothesis alignment method is proposed to carry out the word alignment and word reordering issues. First of all, the source–target word alignment links are produced as the hidden variables by exporting source phrase spans during the translation decoding process. Secondly, a mapping strategy and normalisation model are employed to acquire the 1- to-1 alignment links and build the confusion network (CN). The source-side context-based method outperforms the state-of-the-art TERbased alignment model in our experiments on the WMT09 English-to-French and NIST Chinese-to-English data sets respectively. Experimental results demonstrate that our proposed approach scores consistently among the best results across different data and language pair conditions

    Light Gated Recurrent Units for Speech Recognition

    Full text link
    A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech recognizers often employ acoustic models based on Recurrent Neural Networks (RNNs), that are naturally able to exploit large time contexts and long-term speech modulations. It is thus of great interest to continue the study of proper techniques for improving the effectiveness of RNNs in processing speech signals. In this paper, we revise one of the most popular RNN models, namely Gated Recurrent Units (GRUs), and propose a simplified architecture that turned out to be very effective for ASR. The contribution of this work is two-fold: First, we analyze the role played by the reset gate, showing that a significant redundancy with the update gate occurs. As a result, we propose to remove the former from the GRU design, leading to a more efficient and compact single-gate model. Second, we propose to replace hyperbolic tangent with ReLU activations. This variation couples well with batch normalization and could help the model learn long-term dependencies without numerical issues. Results show that the proposed architecture, called Light GRU (Li-GRU), not only reduces the per-epoch training time by more than 30% over a standard GRU, but also consistently improves the recognition accuracy across different tasks, input features, noisy conditions, as well as across different ASR paradigms, ranging from standard DNN-HMM speech recognizers to end-to-end CTC models.Comment: Copyright 2018 IEE

    An HMM--ELLAM scheme on generic polygonal meshes for miscible incompressible flows in porous media

    Full text link
    We design a numerical approximation of a system of partial differential equations modelling the miscible displacement of a fluid by another in a porous medium. The advective part of the system is discretised using a characteristic method, and the diffusive parts by a finite volume method. The scheme is applicable on generic (possibly non-conforming) meshes as encountered in applications. The main features of our work are the reconstruction of a Darcy velocity, from the discrete pressure fluxes, that enjoys a local consistency property, an analysis of implementation issues faced when tracking, via the characteristic method, distorted cells, and a new treatment of cells near the injection well that accounts better for the conservativity of the injected fluid

    Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

    Full text link
    This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of 4.534.53 comparable to a MOS of 4.584.58 for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and F0F_0 features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.Comment: Accepted to ICASSP 201

    Speaker segmentation and clustering

    Get PDF
    This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

    Time-step coupling for hybrid simulations of multiscale flows

    Get PDF
    A new method is presented for the exploitation of time-scale separation in hybrid continuum-molecular models of multiscale flows. Our method is a generalisation of existing approaches, and is evaluated in terms of computational efficiency and physical/numerical error. Comparison with existing schemes demonstrates comparable, or much improved, physical accuracy, at comparable, or far greater, efficiency (in terms of the number of time-step operations required to cover the same physical time). A leapfrog coupling is proposed between the ‘macro’ and ‘micro’ components of the hybrid model and demonstrates potential for improved numerical accuracy over a standard simultaneous approach. A general algorithm for a coupled time step is presented. Three test cases are considered where the degree of time-scale separation naturally varies during the course of the simulation. First, the step response of a second-order system composed of two linearly-coupled ODEs. Second, a micro-jet actuator combining a kinetic treatment in a small flow region where rarefaction is important with a simple ODE enforcing mass conservation in a much larger spatial region. Finally, the transient start-up flow of a journal bearing with a cylindrical rarefied gas layer. Our new time-stepping method consistently demonstrates as good as or better performance than existing schemes. This superior overall performance is due to an adaptability inherent in the method, which allows the most-desirable aspects of existing schemes to be applied only in the appropriate conditions
    corecore