3,856 research outputs found

    Parallel on-line parsing in constant time per word

    Get PDF
    An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.\ud \ud The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n2) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n2) processors

    Context-Free Path Querying with Structural Representation of Result

    Full text link
    Graph data model and graph databases are very popular in various areas such as bioinformatics, semantic web, and social networks. One specific problem in the area is a path querying with constraints formulated in terms of formal grammars. The query in this approach is written as grammar, and paths querying is graph parsing with respect to given grammar. There are several solutions to it, but how to provide structural representation of query result which is practical for answer processing and debugging is still an open problem. In this paper we propose a graph parsing technique which allows one to build such representation with respect to given grammar in polynomial time and space for arbitrary context-free grammar and graph. Proposed algorithm is based on generalized LL parsing algorithm, while previous solutions are based mostly on CYK or Earley algorithms, which reduces time complexity in some cases.Comment: Evaluation extende

    Pattern matching in compilers

    Get PDF
    In this thesis we develop tools for effective and flexible pattern matching. We introduce a new pattern matching system called amethyst. Amethyst is not only a generator of parsers of programming languages, but can also serve as an alternative to tools for matching regular expressions. Our framework also produces dynamic parsers. Its intended use is in the context of IDE (accurate syntax highlighting and error detection on the fly). Amethyst offers pattern matching of general data structures. This makes it a useful tool for implementing compiler optimizations such as constant folding, instruction scheduling, and dataflow analysis in general. The parsers produced are essentially top-down parsers. Linear time complexity is obtained by introducing the novel notion of structured grammars and regularized regular expressions. Amethyst uses techniques known from compiler optimizations to produce effective parsers.Comment: master thesi

    Context-Free Path Querying by Matrix Multiplication

    Full text link
    Graph data models are widely used in many areas, for example, bioinformatics, graph databases. In these areas, it is often required to process queries for large graphs. Some of the most common graph queries are navigational queries. The result of query evaluation is a set of implicit relations between nodes of the graph, i.e. paths in the graph. A natural way to specify these relations is by specifying paths using formal grammars over the alphabet of edge labels. An answer to a context-free path query in this approach is usually a set of triples (A, m, n) such that there is a path from the node m to the node n, whose labeling is derived from a non-terminal A of the given context-free grammar. This type of queries is evaluated using the relational query semantics. Another example of path query semantics is the single-path query semantics which requires presenting a single path from the node m to the node n, whose labeling is derived from a non-terminal A for all triples (A, m, n) evaluated using the relational query semantics. There is a number of algorithms for query evaluation which use these semantics but all of them perform poorly on large graphs. One of the most common technique for efficient big data processing is the use of a graphics processing unit (GPU) to perform computations, but these algorithms do not allow to use this technique efficiently. In this paper, we show how the context-free path query evaluation using these query semantics can be reduced to the calculation of the matrix transitive closure. Also, we propose an algorithm for context-free path query evaluation which uses relational query semantics and is based on matrix operations that make it possible to speed up computations by using a GPU.Comment: 9 pages, 11 figures, 2 table

    Multiple Context-Free Tree Grammars: Lexicalization and Characterization

    Get PDF
    Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.Comment: 78 pages, 13 figure
    corecore