Search CORE

23,179 research outputs found

Pushdown automata in statistical machine translation

Author: Adrià de Gispert
Aho Alfred V.
Bar-Hillel Y.
Bill Byrne
Blackwood Graeme
Brants Thorsten
Chang Yin-Wen
Chelba Ciprian
Cyril Allauzen
Dyer Chris
Gonzalo Iglesias
Hopkins M.
Huang Liang
Huang Liang
Huang Liang
Koo Terry
Kumar Shankar
Ljolje Andrej
Michael Riley
Mohri Mehryar
Nederhof Mark-Jan
Roark Brian
Roark Brian
Rush Alexander M.
Stolcke Andreas
Stolcke Andreas
Wu Dekai
Zens Richard
Publication venue: Computational Linguistics
Publication date: 01/01/2013
Field of study

This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p

CiteSeerX

Crossref

Apollo (Cambridge)

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Author: Cook Cory
Publication venue: SJSU ScholarWorks
Publication date: 14/06/2016
Field of study

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

SJSU ScholarWorks

Practical experiments with regular approximation of context-free languages

Author: Nederhof Mark-Jan
Publication venue
Publication date: 25/10/1999
Field of study

Several methods are discussed that construct a finite automaton given a context-free grammar, including both methods that lead to subsets and those that lead to supersets of the original context-free language. Some of these methods of regular approximation are new, and some others are presented here in a more refined form with respect to existing literature. Practical experiments with the different methods of regular approximation are performed for spoken-language input: hypotheses from a speech recognizer are filtered through a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Boolean Circuit Complexity of Regular Languages

Author: Valdats Maris
Publication venue: 'Open Publishing Association'
Publication date: 21/05/2014
Field of study

In this paper we define a new descriptional complexity measure for Deterministic Finite Automata, BC-complexity, as an alternative to the state complexity. We prove that for two DFAs with the same number of states BC-complexity can differ exponentially. In some cases minimization of DFA can lead to an exponential increase in BC-complexity, on the other hand BC-complexity of DFAs with a large state space which are obtained by some standard constructions (determinization of NFA, language operations), is reasonably small. But our main result is the analogue of the "Shannon effect" for finite automata: almost all DFAs with a fixed number of states have BC-complexity that is close to the maximum.Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Directory of Open Access Journals

Finite-state Strategies in Delay Games (full version)

Author: Winter Sarah
Zimmermann Martin
Publication venue
Publication date: 01/01/2017
Field of study

What is a finite-state strategy in a delay game? We answer this surprisingly non-trivial question by presenting a very general framework that allows to remove delay: finite-state strategies exist for all winning conditions where the resulting delay-free game admits a finite-state strategy. The framework is applicable to games whose winning condition is recognized by an automaton with an acceptance condition that satisfies a certain aggregation property. Our framework also yields upper bounds on the complexity of determining the winner of such delay games and upper bounds on the necessary lookahead to win the game. In particular, we cover all previous results of that kind as special cases of our uniform approach

arXiv.org e-Print Archive

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Repositório do INPA

Automata theory in nominal sets

Author: Bojańczyk Mikołaj
Klin Bartek
Lasota Sławomir
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 14/08/2014
Field of study

We study languages over infinite alphabets equipped with some structure that can be tested by recognizing automata. We develop a framework for studying such alphabets and the ensuing automata theory, where the key role is played by an automorphism group of the alphabet. In the process, we generalize nominal sets due to Gabbay and Pitts

arXiv.org e-Print Archive

CiteSeerX

Episciences.org

Exact and Approximate Determinization of Discounted-Sum Automata

Author: Boker Udi
Henzinger Thomas A.
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2014
Field of study

A discounted-sum automaton (NDA) is a nondeterministic finite automaton with edge weights, valuing a run by the discounted sum of visited edge weights. More precisely, the weight in the i-th position of the run is divided by

\lambda^i

, where the discount factor

\lambda

is a fixed rational number greater than 1. The value of a word is the minimal value of the automaton runs on it. Discounted summation is a common and useful measuring scheme, especially for infinite sequences, reflecting the assumption that earlier weights are more important than later weights. Unfortunately, determinization of NDAs, which is often essential in formal verification, is, in general, not possible. We provide positive news, showing that every NDA with an integral discount factor is determinizable. We complete the picture by proving that the integers characterize exactly the discount factors that guarantee determinizability: for every nonintegral rational discount factor

\lambda

, there is a nondeterminizable

\lambda

-NDA. We also prove that the class of NDAs with integral discount factors enjoys closure under the algebraic operations min, max, addition, and subtraction, which is not the case for general NDAs nor for deterministic NDAs. For general NDAs, we look into approximate determinization, which is always possible as the influence of a word's suffix decays. We show that the naive approach, of unfolding the automaton computations up to a sufficient level, is doubly exponential in the discount factor. We provide an alternative construction for approximate determinization, which is singly exponential in the discount factor, in the precision, and in the number of states. We also prove matching lower bounds, showing that the exponential dependency on each of these three parameters cannot be avoided. All our results hold equally for automata over finite words and for automata over infinite words

arXiv.org e-Print Archive

Episciences.org

IST Austria: PubRep (Institute of Science and Technology)