315,614 research outputs found

    Building Domain Specific Languages for Voice Recognition Applications

    Get PDF
    This paper presents a method of implementing the voice recognition for the control of software applications. The solutions proposed are based on transforming a subset of the natural language in commands recognized by the application using a formal language defined by the means of a context free grammar. At the end of the paper is presented the modality of integration of voice recognition and of voice synthesis for the Romanian language in Windows applications.voice recognition, formal languages, context free grammars, text to speech

    Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda

    Get PDF
    Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiant's algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system

    Improved genetic algorithm for the context-free grammatical inference

    Get PDF
    Inductive learning of formal languages, often called grammatical inference, is an active area inmachine learning and computational learning theory. By learning a language we understandfinding the grammar of the language when some positive (words from language) and negativeexamples (words that are not in language) are given. Learning mechanisms use the naturallanguage learning model: people master a language, used by their environment, by the analysis ofpositive and negative examples. The problem of inferring context-free languages (CFG) has boththeoretical and practical motivations. Practical applications include pattern recognition (forexample finding DTD or XML schemas for XML documents) and speech recognition (the abilityto infer context-free grammars for natural languages would enable speech recognition to modify itsinternal grammar on the fly). There were several attempts to find effective learning methods forcontext-free languages (for example [1,2,3,4,5]). In particular, Y.Sakakibara [3] introduced aninteresting method of finding a context-free grammar in the Chomsky normal form with a minimalset of nonterminals. He used the tabular representation similar to the parse table used in the CYKalgorithm, simultaneously with genetic algorithms. In this paper we present several adjustments tothe algorithm suggested by Sakakibara. The adjustments are concerned mainly with the geneticalgorithms used and are as follows:– we introduce a method of creating the initial population which makes use of characteristicfeatures of context-free grammars,– new genetic operations are used (mutation with a path added, ‘die process’, ‘war/diseaseprocess’),– different definition of the fitness function,– an effective compression of the structure of an individual in the population is suggested.These changes allow to speed up the process of grammar generation and, what is more, theyallow to infer richer grammars than considered in [3]

    A Note on the Complexity of Restricted Attribute-Value Grammars

    Full text link
    The recognition problem for attribute-value grammars (AVGs) was shown to be undecidable by Johnson in 1988. Therefore, the general form of AVGs is of no practical use. In this paper we study a very restricted form of AVG, for which the recognition problem is decidable (though still NP-complete), the R-AVG. We show that the R-AVG formalism captures all of the context free languages and more, and introduce a variation on the so-called `off-line parsability constraint', the `honest parsability constraint', which lets different types of R-AVG coincide precisely with well-known time complexity classes.Comment: 18 pages, also available by (1) anonymous ftp at ftp://ftp.fwi.uva.nl/pub/theory/illc/researchReports/CT-95-02.ps.gz ; (2) WWW from http://www.fwi.uva.nl/~mtrautwe

    The Surprising Computational Power of Nondeterministic Stack RNNs

    Full text link
    Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells. In theory (assuming bounded range and precision), this limits their formal language recognition power to regular languages, and in practice, RNNs have been shown to be unable to learn many context-free languages (CFLs). In order to expand the class of languages RNNs recognize, prior work has augmented RNNs with a nondeterministic stack data structure, putting them on par with pushdown automata and increasing their language recognition power to CFLs. Nondeterminism is needed for recognizing all CFLs (not just deterministic CFLs), but in this paper, we show that nondeterminism and the neural controller interact to produce two more unexpected abilities. First, the nondeterministic stack RNN can recognize not only CFLs, but also many non-context-free languages. Second, it can recognize languages with much larger alphabet sizes than one might expect given the size of its stack alphabet. Finally, to increase the information capacity in the stack and allow it to solve more complicated tasks with large alphabet sizes, we propose a new version of the nondeterministic stack that simulates stacks of vectors rather than discrete symbols. We demonstrate perplexity improvements with this new model on the Penn Treebank language modeling benchmark.Comment: 20 pages, 7 figures. Submitted to ICLR 202

    Theory of ω-languages. II: A study of various models of ω-type generation and recognition

    Get PDF
    ω-languages are sets consisting of ω-length strings; ω-automata are recognition devicesfor ω-languages. In a previous paper the basic notions of ω-grammars, ω-context-free languages (ω-CFL's), and ω-pushdown automata (ω-PDA's) were first defined and studied. In this paper various modes of ω-type generation are introduced and the effect of certain restrictions on the derivations in ω-grammars is investigated. Several distinct models of recognition in ω-PDA's are considered, giving rise to a hierarchy of subfamilies of the ω-CFL's. The relations among these subfamilies are established and characterizations for each family are derived. Non-leftmost derivations in ω-CFG's are studied and it is shown that leftmost generation in ω-CFG's is strictly more powerful than non-leftmost generation

    Simulation of Two-Way Pushdown Automata Revisited

    Get PDF
    The linear-time simulation of 2-way deterministic pushdown automata (2DPDA) by the Cook and Jones constructions is revisited. Following the semantics-based approach by Jones, an interpreter is given which, when extended with random-access memory, performs a linear-time simulation of 2DPDA. The recursive interpreter works without the dump list of the original constructions, which makes Cook's insight into linear-time simulation of exponential-time automata more intuitive and the complexity argument clearer. The simulation is then extended to 2-way nondeterministic pushdown automata (2NPDA) to provide for a cubic-time recognition of context-free languages. The time required to run the final construction depends on the degree of nondeterminism. The key mechanism that enables the polynomial-time simulations is the sharing of computations by memoization.Comment: In Proceedings Festschrift for Dave Schmidt, arXiv:1309.455

    Improving Scene Text Recognition for Character-Level Long-Tailed Distribution

    Full text link
    Despite the recent remarkable improvements in scene text recognition (STR), the majority of the studies focused mainly on the English language, which only includes few number of characters. However, STR models show a large performance degradation on languages with a numerous number of characters (e.g., Chinese and Korean), especially on characters that rarely appear due to the long-tailed distribution of characters in such languages. To address such an issue, we conducted an empirical analysis using synthetic datasets with different character-level distributions (e.g., balanced and long-tailed distributions). While increasing a substantial number of tail classes without considering the context helps the model to correctly recognize characters individually, training with such a synthetic dataset interferes the model with learning the contextual information (i.e., relation among characters), which is also important for predicting the whole word. Based on this motivation, we propose a novel Context-Aware and Free Experts Network (CAFE-Net) using two experts: 1) context-aware expert learns the contextual representation trained with a long-tailed dataset composed of common words used in everyday life and 2) context-free expert focuses on correctly predicting individual characters by utilizing a dataset with a balanced number of characters. By training two experts to focus on learning contextual and visual representations, respectively, we propose a novel confidence ensemble method to compensate the limitation of each expert. Through the experiments, we demonstrate that CAFE-Net improves the STR performance on languages containing numerous number of characters. Moreover, we show that CAFE-Net is easily applicable to various STR models.Comment: 17 page
    corecore