23 research outputs found
Hyper-Minimization for Deterministic Weighted Tree Automata
Hyper-minimization is a state reduction technique that allows a finite change
in the semantics. The theory for hyper-minimization of deterministic weighted
tree automata is provided. The presence of weights slightly complicates the
situation in comparison to the unweighted case. In addition, the first
hyper-minimization algorithm for deterministic weighted tree automata, weighted
over commutative semifields, is provided together with some implementation
remarks that enable an efficient implementation. In fact, the same run-time O(m
log n) as in the unweighted case is obtained, where m is the size of the
deterministic weighted tree automaton and n is its number of states.Comment: In Proceedings AFL 2014, arXiv:1405.527
Random Generation of Nondeterministic Finite-State Tree Automata
Algorithms for (nondeterministic) finite-state tree automata (FTAs) are often
tested on random FTAs, in which all internal transitions are equiprobable. The
run-time results obtained in this manner are usually overly optimistic as most
such generated random FTAs are trivial in the sense that the number of states
of an equivalent minimal deterministic FTA is extremely small. It is
demonstrated that nontrivial random FTAs are obtained only for a narrow band of
transition probabilities. Moreover, an analytic analysis yields a formula to
approximate the transition probability that yields the most complex random
FTAs, which should be used in experiments.Comment: In Proceedings TTATT 2013, arXiv:1311.5058. Andreas Maletti and
Daniel Quernheim were financially supported by the German Research Foundation
(DFG) grant MA/4959/1-
Bimorphism Machine Translation
The field of statistical machine translation has made tremendous progress due to the rise of statistical methods, making it possible to obtain a translation system automatically from a bilingual collection of text. Some approaches do not even need any kind of linguistic annotation, and can infer translation rules from raw, unannotated data. However, most state-of-the art systems do linguistic structure little justice, and moreover many approaches that have been put forward use ad-hoc formalisms and algorithms. This inevitably leads to duplication of effort, and a separation between theoretical researchers and practitioners.
In order to remedy the lack of motivation and rigor, the contributions of this dissertation are threefold:
1. After laying out the historical background and context, as well as the mathematical and linguistic foundations, a rigorous algebraic model of machine translation is put forward. We use regular tree grammars and bimorphisms as the backbone, introducing a modular architecture that allows different input and output formalisms.
2. The challenges of implementing this bimorphism-based model in a machine translation toolkit are then described, explaining in detail the algorithms used for the core components.
3. Finally, experiments where the toolkit is applied on real-world data and used for diagnostic purposes are described. We discuss how we use exact decoding to reason about search errors and model errors in a popular machine translation toolkit, and we compare output formalisms of different generative capacity
Large-scale Exact Decoding: The IMS-TTT submission to WMT14
We present the IMS-TTT submission to WMT14, an experimental statistical tree-to-tree machine translation system based on the multi-bottom up tree transducer in-cluding rule extraction, tuning and decod-ing. Thanks to input parse forests and a “no pruning ” strategy during decoding, the obtained translations are competitive. The drawbacks are a restricted coverage of 70 % on test data, in part due to ex-act input parse tree matching, and a rela-tively high runtime. Advantages include easy redecoding with a different weight vector, since the full translation forests can be stored after the first decoding pass.