315,614 research outputs found
Building Domain Specific Languages for Voice Recognition Applications
This paper presents a method of implementing the voice recognition for the control of software applications. The solutions proposed are based on transforming a subset of the natural language in commands recognized by the application using a formal language defined by the means of a context free grammar. At the end of the paper is presented the modality of integration of voice recognition and of voice synthesis for the Romanian language in Windows applications.voice recognition, formal languages, context free grammars, text to speech
Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda
Valiant (1975) has developed an algorithm for recognition of context free
languages. As of today, it remains the algorithm with the best asymptotic
complexity for this purpose. In this paper, we present an algebraic
specification, implementation, and proof of correctness of a generalisation of
Valiant's algorithm. The generalisation can be used for recognition, parsing or
generic calculation of the transitive closure of upper triangular matrices. The
proof is certified by the Agda proof assistant. The certification is
representative of state-of-the-art methods for specification and proofs in
proof assistants based on type-theory. As such, this paper can be read as a
tutorial for the Agda system
Improved genetic algorithm for the context-free grammatical inference
Inductive learning of formal languages, often called grammatical inference, is an active area inmachine learning and computational learning theory. By learning a language we understandfinding the grammar of the language when some positive (words from language) and negativeexamples (words that are not in language) are given. Learning mechanisms use the naturallanguage learning model: people master a language, used by their environment, by the analysis ofpositive and negative examples. The problem of inferring context-free languages (CFG) has boththeoretical and practical motivations. Practical applications include pattern recognition (forexample finding DTD or XML schemas for XML documents) and speech recognition (the abilityto infer context-free grammars for natural languages would enable speech recognition to modify itsinternal grammar on the fly). There were several attempts to find effective learning methods forcontext-free languages (for example [1,2,3,4,5]). In particular, Y.Sakakibara [3] introduced aninteresting method of finding a context-free grammar in the Chomsky normal form with a minimalset of nonterminals. He used the tabular representation similar to the parse table used in the CYKalgorithm, simultaneously with genetic algorithms. In this paper we present several adjustments tothe algorithm suggested by Sakakibara. The adjustments are concerned mainly with the geneticalgorithms used and are as follows:– we introduce a method of creating the initial population which makes use of characteristicfeatures of context-free grammars,– new genetic operations are used (mutation with a path added, ‘die process’, ‘war/diseaseprocess’),– different definition of the fitness function,– an effective compression of the structure of an individual in the population is suggested.These changes allow to speed up the process of grammar generation and, what is more, theyallow to infer richer grammars than considered in [3]
A Note on the Complexity of Restricted Attribute-Value Grammars
The recognition problem for attribute-value grammars (AVGs) was shown to be
undecidable by Johnson in 1988. Therefore, the general form of AVGs is of no
practical use. In this paper we study a very restricted form of AVG, for which
the recognition problem is decidable (though still NP-complete), the R-AVG. We
show that the R-AVG formalism captures all of the context free languages and
more, and introduce a variation on the so-called `off-line parsability
constraint', the `honest parsability constraint', which lets different types of
R-AVG coincide precisely with well-known time complexity classes.Comment: 18 pages, also available by (1) anonymous ftp at
ftp://ftp.fwi.uva.nl/pub/theory/illc/researchReports/CT-95-02.ps.gz ; (2) WWW
from http://www.fwi.uva.nl/~mtrautwe
The Surprising Computational Power of Nondeterministic Stack RNNs
Traditional recurrent neural networks (RNNs) have a fixed, finite number of
memory cells. In theory (assuming bounded range and precision), this limits
their formal language recognition power to regular languages, and in practice,
RNNs have been shown to be unable to learn many context-free languages (CFLs).
In order to expand the class of languages RNNs recognize, prior work has
augmented RNNs with a nondeterministic stack data structure, putting them on
par with pushdown automata and increasing their language recognition power to
CFLs. Nondeterminism is needed for recognizing all CFLs (not just deterministic
CFLs), but in this paper, we show that nondeterminism and the neural controller
interact to produce two more unexpected abilities. First, the nondeterministic
stack RNN can recognize not only CFLs, but also many non-context-free
languages. Second, it can recognize languages with much larger alphabet sizes
than one might expect given the size of its stack alphabet. Finally, to
increase the information capacity in the stack and allow it to solve more
complicated tasks with large alphabet sizes, we propose a new version of the
nondeterministic stack that simulates stacks of vectors rather than discrete
symbols. We demonstrate perplexity improvements with this new model on the Penn
Treebank language modeling benchmark.Comment: 20 pages, 7 figures. Submitted to ICLR 202
Theory of ω-languages. II: A study of various models of ω-type generation and recognition
ω-languages are sets consisting of ω-length strings; ω-automata are recognition devicesfor ω-languages. In a previous paper the basic notions of ω-grammars, ω-context-free languages (ω-CFL's), and ω-pushdown automata (ω-PDA's) were first defined and studied. In this paper various modes of ω-type generation are introduced and the effect of certain restrictions on the derivations in ω-grammars is investigated. Several distinct models of recognition in ω-PDA's are considered, giving rise to a hierarchy of subfamilies of the ω-CFL's. The relations among these subfamilies are established and characterizations for each family are derived. Non-leftmost derivations in ω-CFG's are studied and it is shown that leftmost generation in ω-CFG's is strictly more powerful than non-leftmost generation
Simulation of Two-Way Pushdown Automata Revisited
The linear-time simulation of 2-way deterministic pushdown automata (2DPDA)
by the Cook and Jones constructions is revisited. Following the semantics-based
approach by Jones, an interpreter is given which, when extended with
random-access memory, performs a linear-time simulation of 2DPDA. The recursive
interpreter works without the dump list of the original constructions, which
makes Cook's insight into linear-time simulation of exponential-time automata
more intuitive and the complexity argument clearer. The simulation is then
extended to 2-way nondeterministic pushdown automata (2NPDA) to provide for a
cubic-time recognition of context-free languages. The time required to run the
final construction depends on the degree of nondeterminism. The key mechanism
that enables the polynomial-time simulations is the sharing of computations by
memoization.Comment: In Proceedings Festschrift for Dave Schmidt, arXiv:1309.455
Improving Scene Text Recognition for Character-Level Long-Tailed Distribution
Despite the recent remarkable improvements in scene text recognition (STR),
the majority of the studies focused mainly on the English language, which only
includes few number of characters. However, STR models show a large performance
degradation on languages with a numerous number of characters (e.g., Chinese
and Korean), especially on characters that rarely appear due to the long-tailed
distribution of characters in such languages. To address such an issue, we
conducted an empirical analysis using synthetic datasets with different
character-level distributions (e.g., balanced and long-tailed distributions).
While increasing a substantial number of tail classes without considering the
context helps the model to correctly recognize characters individually,
training with such a synthetic dataset interferes the model with learning the
contextual information (i.e., relation among characters), which is also
important for predicting the whole word. Based on this motivation, we propose a
novel Context-Aware and Free Experts Network (CAFE-Net) using two experts: 1)
context-aware expert learns the contextual representation trained with a
long-tailed dataset composed of common words used in everyday life and 2)
context-free expert focuses on correctly predicting individual characters by
utilizing a dataset with a balanced number of characters. By training two
experts to focus on learning contextual and visual representations,
respectively, we propose a novel confidence ensemble method to compensate the
limitation of each expert. Through the experiments, we demonstrate that
CAFE-Net improves the STR performance on languages containing numerous number
of characters. Moreover, we show that CAFE-Net is easily applicable to various
STR models.Comment: 17 page
- …