3,803 research outputs found
Source-side context-informed hypothesis alignment for combining outputs from machine translation systems
This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. Traditional hypothesis alignment algorithms such
as TER, HMM and IHMM do not directly utilise the context information of the source side but rather address the alignment issues via the output data itself. In this paper, a source-side context-informed (SSCI) hypothesis alignment method is proposed to carry out the word alignment and word reordering issues. First of all, the source–target word alignment links are produced as the hidden variables by exporting source phrase spans during the translation decoding process. Secondly, a mapping strategy and normalisation model are employed to acquire the 1-
to-1 alignment links and build the confusion network (CN). The source-side context-based method outperforms the state-of-the-art TERbased alignment model in our experiments
on the WMT09 English-to-French and NIST Chinese-to-English data sets respectively. Experimental results demonstrate that our proposed approach scores consistently among the
best results across different data and language pair conditions
Light Gated Recurrent Units for Speech Recognition
A field that has directly benefited from the recent advances in deep learning
is Automatic Speech Recognition (ASR). Despite the great achievements of the
past decades, however, a natural and robust human-machine speech interaction
still appears to be out of reach, especially in challenging environments
characterized by significant noise and reverberation. To improve robustness,
modern speech recognizers often employ acoustic models based on Recurrent
Neural Networks (RNNs), that are naturally able to exploit large time contexts
and long-term speech modulations. It is thus of great interest to continue the
study of proper techniques for improving the effectiveness of RNNs in
processing speech signals.
In this paper, we revise one of the most popular RNN models, namely Gated
Recurrent Units (GRUs), and propose a simplified architecture that turned out
to be very effective for ASR. The contribution of this work is two-fold: First,
we analyze the role played by the reset gate, showing that a significant
redundancy with the update gate occurs. As a result, we propose to remove the
former from the GRU design, leading to a more efficient and compact single-gate
model. Second, we propose to replace hyperbolic tangent with ReLU activations.
This variation couples well with batch normalization and could help the model
learn long-term dependencies without numerical issues.
Results show that the proposed architecture, called Light GRU (Li-GRU), not
only reduces the per-epoch training time by more than 30% over a standard GRU,
but also consistently improves the recognition accuracy across different tasks,
input features, noisy conditions, as well as across different ASR paradigms,
ranging from standard DNN-HMM speech recognizers to end-to-end CTC models.Comment: Copyright 2018 IEE
An HMM--ELLAM scheme on generic polygonal meshes for miscible incompressible flows in porous media
We design a numerical approximation of a system of partial differential
equations modelling the miscible displacement of a fluid by another in a porous
medium. The advective part of the system is discretised using a characteristic
method, and the diffusive parts by a finite volume method. The scheme is
applicable on generic (possibly non-conforming) meshes as encountered in
applications. The main features of our work are the reconstruction of a Darcy
velocity, from the discrete pressure fluxes, that enjoys a local consistency
property, an analysis of implementation issues faced when tracking, via the
characteristic method, distorted cells, and a new treatment of cells near the
injection well that accounts better for the conservativity of the injected
fluid
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
This paper describes Tacotron 2, a neural network architecture for speech
synthesis directly from text. The system is composed of a recurrent
sequence-to-sequence feature prediction network that maps character embeddings
to mel-scale spectrograms, followed by a modified WaveNet model acting as a
vocoder to synthesize timedomain waveforms from those spectrograms. Our model
achieves a mean opinion score (MOS) of comparable to a MOS of for
professionally recorded speech. To validate our design choices, we present
ablation studies of key components of our system and evaluate the impact of
using mel spectrograms as the input to WaveNet instead of linguistic, duration,
and features. We further demonstrate that using a compact acoustic
intermediate representation enables significant simplification of the WaveNet
architecture.Comment: Accepted to ICASSP 201
Speaker segmentation and clustering
This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved
Time-step coupling for hybrid simulations of multiscale flows
A new method is presented for the exploitation of time-scale separation in hybrid continuum-molecular models of multiscale flows. Our method is a generalisation of existing approaches, and is evaluated in terms of computational efficiency and physical/numerical error. Comparison with existing schemes demonstrates comparable, or much improved, physical accuracy, at comparable, or far greater, efficiency (in terms of the number of time-step operations required to cover the same physical time). A leapfrog coupling is proposed between the ‘macro’ and ‘micro’ components of the hybrid model and demonstrates potential for improved numerical accuracy over a standard simultaneous approach. A general algorithm for a coupled time step is presented. Three test cases are considered where the degree of time-scale separation naturally varies during the course of the simulation. First, the step response of a second-order system composed of two linearly-coupled ODEs. Second, a micro-jet actuator combining a kinetic treatment in a small flow region where rarefaction is important with a simple ODE enforcing mass conservation in a much larger spatial region. Finally, the transient start-up flow of a journal bearing with a cylindrical rarefied gas layer. Our new time-stepping method consistently demonstrates as good as or better performance than existing schemes. This superior overall performance is due to an adaptability inherent in the method, which allows the most-desirable aspects of existing schemes to be applied only in the appropriate conditions
- …