23 research outputs found

    Hyper-Minimization for Deterministic Weighted Tree Automata

    Full text link
    Hyper-minimization is a state reduction technique that allows a finite change in the semantics. The theory for hyper-minimization of deterministic weighted tree automata is provided. The presence of weights slightly complicates the situation in comparison to the unweighted case. In addition, the first hyper-minimization algorithm for deterministic weighted tree automata, weighted over commutative semifields, is provided together with some implementation remarks that enable an efficient implementation. In fact, the same run-time O(m log n) as in the unweighted case is obtained, where m is the size of the deterministic weighted tree automaton and n is its number of states.Comment: In Proceedings AFL 2014, arXiv:1405.527

    Random Generation of Nondeterministic Finite-State Tree Automata

    Full text link
    Algorithms for (nondeterministic) finite-state tree automata (FTAs) are often tested on random FTAs, in which all internal transitions are equiprobable. The run-time results obtained in this manner are usually overly optimistic as most such generated random FTAs are trivial in the sense that the number of states of an equivalent minimal deterministic FTA is extremely small. It is demonstrated that nontrivial random FTAs are obtained only for a narrow band of transition probabilities. Moreover, an analytic analysis yields a formula to approximate the transition probability that yields the most complex random FTAs, which should be used in experiments.Comment: In Proceedings TTATT 2013, arXiv:1311.5058. Andreas Maletti and Daniel Quernheim were financially supported by the German Research Foundation (DFG) grant MA/4959/1-

    Bimorphism Machine Translation

    Get PDF
    The field of statistical machine translation has made tremendous progress due to the rise of statistical methods, making it possible to obtain a translation system automatically from a bilingual collection of text. Some approaches do not even need any kind of linguistic annotation, and can infer translation rules from raw, unannotated data. However, most state-of-the art systems do linguistic structure little justice, and moreover many approaches that have been put forward use ad-hoc formalisms and algorithms. This inevitably leads to duplication of effort, and a separation between theoretical researchers and practitioners. In order to remedy the lack of motivation and rigor, the contributions of this dissertation are threefold: 1. After laying out the historical background and context, as well as the mathematical and linguistic foundations, a rigorous algebraic model of machine translation is put forward. We use regular tree grammars and bimorphisms as the backbone, introducing a modular architecture that allows different input and output formalisms. 2. The challenges of implementing this bimorphism-based model in a machine translation toolkit are then described, explaining in detail the algorithms used for the core components. 3. Finally, experiments where the toolkit is applied on real-world data and used for diagnostic purposes are described. We discuss how we use exact decoding to reason about search errors and model errors in a popular machine translation toolkit, and we compare output formalisms of different generative capacity

    Large-scale Exact Decoding: The IMS-TTT submission to WMT14

    No full text
    We present the IMS-TTT submission to WMT14, an experimental statistical tree-to-tree machine translation system based on the multi-bottom up tree transducer in-cluding rule extraction, tuning and decod-ing. Thanks to input parse forests and a “no pruning ” strategy during decoding, the obtained translations are competitive. The drawbacks are a restricted coverage of 70 % on test data, in part due to ex-act input parse tree matching, and a rela-tively high runtime. Advantages include easy redecoding with a different weight vector, since the full translation forests can be stored after the first decoding pass.
    corecore