Search CORE

29 research outputs found

A Phrase-Level Machine Translation Approach For Disfluency Detection Using Weighted Finite State Transducers

Author: Gao Yuqing
Maskey Sameer R.
Zhou Bowen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers. We approach the task as translation of noisy speech to clean speech. We simplify our translation framework such that it does not require fertility and alignment models. We tested our model on the Switchboard disfluency-annotated corpus. Using an optimized decoder that is developed for phrase-based translation at IBM, we are able to detect repeats, repairs and filled pauses for more than a thousand sentences in less than a second with encouraging results. Index Terms: disfluency detection, machine translation, speech-to-speech translation

CiteSeerX

Columbia University Academic Commons

Using Learned Conditional Distributions as Edit Distance

Author: Oncina Jose
Sebban Marc
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 17/08/2006
Field of study

International audienc

HAL-UJM

Spectral learning of transducers over continuous sequences

Author: Quattoni Ariadna Julieta
Recasens Adria
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we present a spectral algorithm for learning weighted nite state transducers (WFSTs) over paired input-output sequences, where the input is continuous and the output discrete. WFSTs are an important tool for modeling paired input-output sequences and have numerous applications in real-world problems. Recently, Balle et al (2011) proposed a spectral method for learning WFSTs that overcomes some of the well known limitations of gradient-based or EM optimizations which can be computationally expensive and su er from local optima issues. Their algorithm can model distributions where both inputs and outputs are sequences from a discrete alphabet. However, many real world problems require modeling paired sequences where the inputs are not discrete but continuos sequences. Modelling continuous sequences with spectral methods has been studied in the context of HMMs (Song et al 2010), where a spectral algorithm for this case was derived. In this paper we follow that line of work and propose a spectral learning algorithm for modelling paired input-output sequences where the inputs are continuous and the outputs are discrete. Our approach is based on generalizing the class of weighted nite state transducers over discrete input-output sequences to a class where transitions are linear combinations of elementary transitions and the weights of this linear combinations are determined by dynamic features of the continuous input sequence. At its core, the algorithm is simple and scalable to large data sets. We present experiments on a real task that validate the eff ectiveness of the proposed approach.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

A Discriminative Model of Stochastic Edit Distance in the form of a Conditional Transducer

Author: A. Dempster
E. Vidal
E.S. Ristad
F. Thollard
G. Bouchard
R. Durbin
R.A. Wagner
R.C. Carrasco
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

pages 240-252International audienceMany real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of \textit{context-sensitive} edit distances

HAL-UJM

Crossref

Local String Transduction as Sequence Labeling

Author: Carreras Xavier
Cohen Shay
Narayan Shashi
Ribeiro Joana
Publication venue
Publication date: 01/08/2018
Field of study

[EN]We show that the general problem of string transduction can be reduced to the problem of sequence labeling. While character deletion and insertions are allowed in string transduction, they do not exist in sequence labeling. We show how to overcome this difference. Our approach can be used with any sequence labeling algorithm and it works best for problems in which string transduction imposes a strong notion of locality (no long range dependencies). We experiment with spelling correction for social media, OCR correction, and morphological inflection, and we see that it behaves better than seq2seq models and yields state-of-the-art results in several cases.Peer reviewe

Edinburgh Research Explorer

Digital.CSIC

DeepProbLog : neural probabilistic logic programming

Author: De Raedt L.
Demeester Thomas
Dumancic S.
Kimmig A.
Manhaeve R.
Publication venue
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography