342 research outputs found

    Parsing Unary Boolean Grammars Using Online Convolution

    Get PDF
    In contrast to context-free grammars, the extension of these grammars by explicit conjunction, the so-called conjunctive grammars can generate (quite complicated) non-regular languages over a single-letter alphabet (DLT 2007). Given these expressibility results, we study the parsability of Boolean grammars, an extension of context-free grammars by conjunction and negation, over a unary alphabet and show that they can be parsed in time O(|G| log^2(n) M(n)) where M(n) is the time to multiply two n-bit integers. This multiplication algorithm is transformed into a convolution algorithm which in turn is converted to an online convolution algorithm which is used for the parsing

    Numbers and Languages

    Get PDF
    The thesis presents results obtained during the authors PhD-studies. First systems of language equations of a simple form consisting of just two equations are proved to be computationally universal. These are systems over unary alphabet, that are seen as systems of equations over natural numbers. The systems contain only an equation X+A=B and an equation X+X+C=X+X+D, where A, B, C and D are eventually periodic constants. It is proved that for every recursive set S there exists natural numbers p and d, and eventually periodic sets A, B, C and D such that a number n is in S if and only if np+d is in the unique solution of the abovementioned system of two equations, so all recursive sets can be represented in an encoded form. It is also proved that all recursive sets cannot be represented as they are, so the encoding is really needed. Furthermore, it is proved that the family of languages generated by Boolean grammars is closed under injective gsm-mappings and inverse gsm-mappings. The arguments apply also for the families of unambiguous Boolean languages, conjunctive languages and unambiguous languages. Finally, characterizations for morphisims preserving subfamilies of context-free languages are presented. It is shown that the families of deterministic and LL context-free languages are closed under codes if and only if they are of bounded deciphering delay. These families are also closed under non-codes, if they map every letter into a submonoid generated by a single word. The family of unambiguous context-free languages is closed under all codes and under the same non-codes as the families of deterministic and LL context-free languages.Siirretty Doriast

    Constant-Delay Enumeration for SLP-Compressed Documents

    Get PDF

    Toward a theory of input-driven locally parsable languages

    Get PDF
    If a context-free language enjoys the local parsability property then, no matter how the source string is segmented, each segment can be parsed independently, and an efficient parallel parsing algorithm becomes possible. The new class of locally chain parsable languages (LCPLs), included in the deterministic context-free language family, is here defined by means of the chain-driven automaton and characterized by decidable properties of grammar derivations. Such automaton decides whether to reduce or not a substring in a way purely driven by the terminal characters, thus extending the well-known concept of input-driven (ID) alias visibly pushdown machines. The LCPL family extends and improves the practically relevant Floyd's operator-precedence (OP) languages which are known to strictly include the ID languages, and for which a parallel-parser generator exists

    Ambiguity Detection Methods for Context-Free Grammars

    Get PDF
    The Meta-Environment enables the creation of grammars using the SDF formalism. From these grammars an SGLR parser can be generated. One of the advantages of these parsers is that they can handle the entire class of context-free grammars (CFGs). The grammar developer does not have to squeeze his grammar into a specific subclass of CFGs that is deterministically parsable. Instead, he can now design his grammar to best describe the structure of his language. The downside of allowing the entire class of CFGs is the danger of ambiguities. An ambiguous grammar prevents some sentences from having a unique meaning, depending on the semantics of the used language. It is best to remove all ambiguities from a grammar before it is used. Unfortunately, the detection of ambiguities in a grammar is an undecidable problem. For a recursive grammar the number of possibilities that have to be checked might be infinite. Various ambiguity detection methods (ADMs) exist, but none can always correctly identify the (un)ambiguity of a grammar. They all try to attack the problem from different angles, which results in different characteristics like termination, accuracy and performance. The goal of this project was to find out which method has the best practical usability. In particu

    Using Contextual Representations to Efficiently Learn Context-Free Languages

    No full text
    International audienceWe present a polynomial update time algorithm for the inductive inference of a large class of context-free languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured context-free languages as well as some context sensitive languages. These representations explicitly model the lattice structure of the distribution of a set of substrings and can be inferred using a generalisation of distributional learning. This formalism is an attempt to bridge the gap between simple learnable classes and the sorts of highly expressive representations necessary for linguistic representation: it allows the learnability of a large class of context-free languages, that includes all regular languages and those context-free languages that satisfy two simple constraints. The formalism and the algorithm seem well suited to natural language and in particular to the modeling of first language acquisition. Preliminary experimental results confirm the effectiveness of this approach
    • …
    corecore