17 research outputs found

    Random Generation of Nondeterministic Finite-State Tree Automata

    Full text link
    Algorithms for (nondeterministic) finite-state tree automata (FTAs) are often tested on random FTAs, in which all internal transitions are equiprobable. The run-time results obtained in this manner are usually overly optimistic as most such generated random FTAs are trivial in the sense that the number of states of an equivalent minimal deterministic FTA is extremely small. It is demonstrated that nontrivial random FTAs are obtained only for a narrow band of transition probabilities. Moreover, an analytic analysis yields a formula to approximate the transition probability that yields the most complex random FTAs, which should be used in experiments.Comment: In Proceedings TTATT 2013, arXiv:1311.5058. Andreas Maletti and Daniel Quernheim were financially supported by the German Research Foundation (DFG) grant MA/4959/1-

    Statistical language models within the algebra of weighted rational languages

    Get PDF
    Statistical language models are an important tool in natural language processing. They represent prior knowledge about a certain language which is usually gained from a set of samples called a corpus. In this paper, we present a novel way of creating N-gram language models using weighted finite automata. The construction of these models is formalised within the algebra underlying weighted finite automata and expressed in terms of weighted rational languages and transductions. Besides the algebra we make use of five special constant weighted transductions which rely only on the alphabet and the model parameter N. In addition, we discuss efficient implementations of these transductions in terms of virtual constructions

    The Acquisition of Programming Skills from Textbooks

    No full text
    We present a computer model for the acquistion of programming languages from textbooks. Starting from a verbal description of the notational conventions that are used to describe the syntactic form of programming commands, a meta grammar is generated that parses concrete command descriptions and builds up grammar rules for that commands. These rules are realized as definite clause grammar rules that captures the syntax of these commands. They can be used to parse and generate syntactically correct examples of a command. However, to solve real programming problems also the semantics of a command and of its parameters needs to be acquired. This is accomplished by the natural language parsing of the explanations given in the text and the augmentation of the definite clause command grammars with semantic structures

    Weaving the Semantic Web: Extracting and Representing the Content of Pathology Reports

    No full text
    Schlangen D, Hanneforth T, Stede M. Weaving the Semantic Web: Extracting and Representing the Content of Pathology Reports. In: Proceedings of the GLDV Conference 2005 (GLDV05). Bonn, Germany; 2005

    Pushing for weighted tree automata

    No full text
    A weight normalization procedure, commonly called pushing, is introduced for weighted tree automata (wta) over commutative semifields. The normalization preserves the recognized weighted tree language even for nondeterministic wta, but it is most useful for bottom-up deterministic wta, where it can be used for minimization and equivalence testing. In both applications a careful selection of the weights to be redistributed followed by normalization allows a reduction of the general problem to the corresponding problem for bottom-up deterministic unweighted tree automata. This approach was already successfully used by Mohri and Eisner for the minimization of deterministic weighted string automata. Moreover, the new equivalence test for two wta MM and MM' runs in time O((M+M)log(Q+Q))\mathcal O((\lvert M \rvert + \lvert M'\rvert) \cdot \log {(\lvert Q\rvert + \lvert Q'\rvert)}), where QQ and QQ' are the states of MM and MM', respectively, which improves the previously best run-time O(MM)\mathcal O(\lvert M \rvert \cdot \lvert M'\rvert)
    corecore