Search CORE

3,803 research outputs found

Source-side context-informed hypothesis alignment for combining outputs from machine translation systems

Author: Du Jinhua
Ma Yanjun
Way Andy
Publication venue
Publication date: 01/01/2009
Field of study

This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. Traditional hypothesis alignment algorithms such as TER, HMM and IHMM do not directly utilise the context information of the source side but rather address the alignment issues via the output data itself. In this paper, a source-side context-informed (SSCI) hypothesis alignment method is proposed to carry out the word alignment and word reordering issues. First of all, the source–target word alignment links are produced as the hidden variables by exporting source phrase spans during the translation decoding process. Secondly, a mapping strategy and normalisation model are employed to acquire the 1- to-1 alignment links and build the confusion network (CN). The source-side context-based method outperforms the state-of-the-art TERbased alignment model in our experiments on the WMT09 English-to-French and NIST Chinese-to-English data sets respectively. Experimental results demonstrate that our proposed approach scores consistently among the best results across different data and language pair conditions

CiteSeerX

Irish Universities

DCU Online Research Access Service

Light Gated Recurrent Units for Speech Recognition

Author: Bengio Yoshua
Brakel Philemon
Omologo Maurizio
Ravanelli Mirco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/03/2018
Field of study

A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech recognizers often employ acoustic models based on Recurrent Neural Networks (RNNs), that are naturally able to exploit large time contexts and long-term speech modulations. It is thus of great interest to continue the study of proper techniques for improving the effectiveness of RNNs in processing speech signals. In this paper, we revise one of the most popular RNN models, namely Gated Recurrent Units (GRUs), and propose a simplified architecture that turned out to be very effective for ASR. The contribution of this work is two-fold: First, we analyze the role played by the reset gate, showing that a significant redundancy with the update gate occurs. As a result, we propose to remove the former from the GRU design, leading to a more efficient and compact single-gate model. Second, we propose to replace hyperbolic tangent with ReLU activations. This variation couples well with batch normalization and could help the model learn long-term dependencies without numerical issues. Results show that the proposed architecture, called Light GRU (Li-GRU), not only reduces the per-epoch training time by more than 30% over a standard GRU, but also consistently improves the recognition accuracy across different tasks, input features, noisy conditions, as well as across different ASR paradigms, ranging from standard DNN-HMM speech recognizers to end-to-end CTC models.Comment: Copyright 2018 IEE

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

An HMM--ELLAM scheme on generic polygonal meshes for miscible incompressible flows in porous media

Author: Cheng Hanz Martin
Droniou Jerome
Publication venue
Publication date: 22/08/2018
Field of study

We design a numerical approximation of a system of partial differential equations modelling the miscible displacement of a fluid by another in a porous medium. The advective part of the system is discretised using a characteristic method, and the diffusive parts by a finite volume method. The scheme is applicable on generic (possibly non-conforming) meshes as encountered in applications. The main features of our work are the reconstruction of a Darcy velocity, from the discrete pressure fluxes, that enjoys a local consistency property, an analysis of implementation issues faced when tracking, via the characteristic method, distorted cells, and a new treatment of cells near the injection well that accounts better for the conservativity of the injected fluid

arXiv.org e-Print Archive

Monash University Research Portal

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Author: Agiomyrgiannakis Yannis
Chen Zhifeng
Jaitly Navdeep
Pang Ruoming
Saurous Rif A.
Schuster Mike
Shen Jonathan
Skerry-Ryan RJ
Wang Yuxuan
Weiss Ron J.
Wu Yonghui
Yang Zongheng
Zhang Yu
Publication venue
Publication date: 15/02/2018
Field of study

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of

4.53

comparable to a MOS of

4.58

for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and

F_0

features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Speaker segmentation and clustering

Author: Ajmera
Ajmera
Almpanidis
Barras
Bimbot
Campbell
Campbell
Cettolo
Constantine Kotropoulos
Delacourt
Deller
Fiscus
Gales
Garofolo
Godfrey
Graff
Graff
Graff
Hansen
Harb
Hess
Huang
Jain
Kim
Know
Lapidot
Lu
Manjunath
Margarita Kotti
Meignier
Oppenheim
Pellom
Reynolds
Sondhi
Tranter
Vassiliki Moschou
Ververidis
Wang
Wu
Wu
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Time-step coupling for hybrid simulations of multiscale flows

Author: Borg Matthew Karl
Duque-Daza Carlos A.
Lockerby Duncan A.
Reese Jason
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

A new method is presented for the exploitation of time-scale separation in hybrid continuum-molecular models of multiscale flows. Our method is a generalisation of existing approaches, and is evaluated in terms of computational efficiency and physical/numerical error. Comparison with existing schemes demonstrates comparable, or much improved, physical accuracy, at comparable, or far greater, efficiency (in terms of the number of time-step operations required to cover the same physical time). A leapfrog coupling is proposed between the ‘macro’ and ‘micro’ components of the hybrid model and demonstrates potential for improved numerical accuracy over a standard simultaneous approach. A general algorithm for a coupled time step is presented. Three test cases are considered where the degree of time-scale separation naturally varies during the course of the simulation. First, the step response of a second-order system composed of two linearly-coupled ODEs. Second, a micro-jet actuator combining a kinetic treatment in a small flow region where rarefaction is important with a simple ODE enforcing mass conservation in a much larger spatial region. Finally, the transient start-up flow of a journal bearing with a cylindrical rarefied gas layer. Our new time-stepping method consistently demonstrates as good as or better performance than existing schemes. This superior overall performance is due to an adaptability inherent in the method, which allows the most-desirable aspects of existing schemes to be applied only in the appropriate conditions

Crossref

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

Warwick Research Archives Portal Repository