3,582 research outputs found
Context-Free Path Querying with Structural Representation of Result
Graph data model and graph databases are very popular in various areas such
as bioinformatics, semantic web, and social networks. One specific problem in
the area is a path querying with constraints formulated in terms of formal
grammars. The query in this approach is written as grammar, and paths querying
is graph parsing with respect to given grammar. There are several solutions to
it, but how to provide structural representation of query result which is
practical for answer processing and debugging is still an open problem. In this
paper we propose a graph parsing technique which allows one to build such
representation with respect to given grammar in polynomial time and space for
arbitrary context-free grammar and graph. Proposed algorithm is based on
generalized LL parsing algorithm, while previous solutions are based mostly on
CYK or Earley algorithms, which reduces time complexity in some cases.Comment: Evaluation extende
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
- …