3,856 research outputs found
Parallel on-line parsing in constant time per word
An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.\ud
\ud
The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n2) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n2) processors
Context-Free Path Querying with Structural Representation of Result
Graph data model and graph databases are very popular in various areas such
as bioinformatics, semantic web, and social networks. One specific problem in
the area is a path querying with constraints formulated in terms of formal
grammars. The query in this approach is written as grammar, and paths querying
is graph parsing with respect to given grammar. There are several solutions to
it, but how to provide structural representation of query result which is
practical for answer processing and debugging is still an open problem. In this
paper we propose a graph parsing technique which allows one to build such
representation with respect to given grammar in polynomial time and space for
arbitrary context-free grammar and graph. Proposed algorithm is based on
generalized LL parsing algorithm, while previous solutions are based mostly on
CYK or Earley algorithms, which reduces time complexity in some cases.Comment: Evaluation extende
Pattern matching in compilers
In this thesis we develop tools for effective and flexible pattern matching.
We introduce a new pattern matching system called amethyst. Amethyst is not
only a generator of parsers of programming languages, but can also serve as an
alternative to tools for matching regular expressions.
Our framework also produces dynamic parsers. Its intended use is in the
context of IDE (accurate syntax highlighting and error detection on the fly).
Amethyst offers pattern matching of general data structures. This makes it a
useful tool for implementing compiler optimizations such as constant folding,
instruction scheduling, and dataflow analysis in general.
The parsers produced are essentially top-down parsers. Linear time complexity
is obtained by introducing the novel notion of structured grammars and
regularized regular expressions. Amethyst uses techniques known from compiler
optimizations to produce effective parsers.Comment: master thesi
Context-Free Path Querying by Matrix Multiplication
Graph data models are widely used in many areas, for example, bioinformatics,
graph databases. In these areas, it is often required to process queries for
large graphs. Some of the most common graph queries are navigational queries.
The result of query evaluation is a set of implicit relations between nodes of
the graph, i.e. paths in the graph. A natural way to specify these relations is
by specifying paths using formal grammars over the alphabet of edge labels. An
answer to a context-free path query in this approach is usually a set of
triples (A, m, n) such that there is a path from the node m to the node n,
whose labeling is derived from a non-terminal A of the given context-free
grammar. This type of queries is evaluated using the relational query
semantics. Another example of path query semantics is the single-path query
semantics which requires presenting a single path from the node m to the node
n, whose labeling is derived from a non-terminal A for all triples (A, m, n)
evaluated using the relational query semantics. There is a number of algorithms
for query evaluation which use these semantics but all of them perform poorly
on large graphs. One of the most common technique for efficient big data
processing is the use of a graphics processing unit (GPU) to perform
computations, but these algorithms do not allow to use this technique
efficiently. In this paper, we show how the context-free path query evaluation
using these query semantics can be reduced to the calculation of the matrix
transitive closure. Also, we propose an algorithm for context-free path query
evaluation which uses relational query semantics and is based on matrix
operations that make it possible to speed up computations by using a GPU.Comment: 9 pages, 11 figures, 2 table
Multiple Context-Free Tree Grammars: Lexicalization and Characterization
Multiple (simple) context-free tree grammars are investigated, where "simple"
means "linear and nondeleting". Every multiple context-free tree grammar that
is finitely ambiguous can be lexicalized; i.e., it can be transformed into an
equivalent one (generating the same tree language) in which each rule of the
grammar contains a lexical symbol. Due to this transformation, the rank of the
nonterminals increases at most by 1, and the multiplicity (or fan-out) of the
grammar increases at most by the maximal rank of the lexical symbols; in
particular, the multiplicity does not increase when all lexical symbols have
rank 0. Multiple context-free tree grammars have the same tree generating power
as multi-component tree adjoining grammars (provided the latter can use a
root-marker). Moreover, every multi-component tree adjoining grammar that is
finitely ambiguous can be lexicalized. Multiple context-free tree grammars have
the same string generating power as multiple context-free (string) grammars and
polynomial time parsing algorithms. A tree language can be generated by a
multiple context-free tree grammar if and only if it is the image of a regular
tree language under a deterministic finite-copying macro tree transducer.
Multiple context-free tree grammars can be used as a synchronous translation
device.Comment: 78 pages, 13 figure
- …