6,932 research outputs found
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
Faster Algorithms for the Maximum Common Subtree Isomorphism Problem
The maximum common subtree isomorphism problem asks for the largest possible
isomorphism between subtrees of two given input trees. This problem is a
natural restriction of the maximum common subgraph problem, which is -hard in general graphs. Confining to trees renders polynomial time
algorithms possible and is of fundamental importance for approaches on more
general graph classes. Various variants of this problem in trees have been
intensively studied. We consider the general case, where trees are neither
rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on
the mapped vertices and edges. For trees of order and maximum degree
our algorithm achieves a running time of by
exploiting the structure of the matching instances arising as subproblems. Thus
our algorithm outperforms the best previously known approaches. No faster
algorithm is possible for trees of bounded degree and for trees of unbounded
degree we show that a further reduction of the running time would directly
improve the best known approach to the assignment problem. Combining a
polynomial-delay algorithm for the enumeration of all maximum common subtree
isomorphisms with central ideas of our new algorithm leads to an improvement of
its running time from to ,
where is the order of the larger tree, is the number of different
solutions, and is the minimum of the maximum degrees of the input
trees. Our theoretical results are supplemented by an experimental evaluation
on synthetic and real-world instances
Stochastic Continuous Time Neurite Branching Models with Tree and Segment Dependent Rates
In this paper we introduce a continuous time stochastic neurite branching
model closely related to the discrete time stochastic BES-model. The discrete
time BES-model is underlying current attempts to simulate cortical development,
but is difficult to analyze. The new continuous time formulation facilitates
analytical treatment thus allowing us to examine the structure of the model
more closely. We derive explicit expressions for the time dependent
probabilities p(\gamma, t) for finding a tree \gamma at time t, valid for
arbitrary continuous time branching models with tree and segment dependent
branching rates. We show, for the specific case of the continuous time
BES-model, that as expected from our model formulation, the sums needed to
evaluate expectation values of functions of the terminal segment number
\mu(f(n),t) do not depend on the distribution of the total branching
probability over the terminal segments. In addition, we derive a system of
differential equations for the probabilities p(n,t) of finding n terminal
segments at time t. For the continuous BES-model, this system of differential
equations gives direct numerical access to functions only depending on the
number of terminal segments, and we use this to evaluate the development of the
mean and standard deviation of the number of terminal segments at a time t. For
comparison we discuss two cases where mean and variance of the number of
terminal segments are exactly solvable. Then we discuss the numerical
evaluation of the S-dependence of the solutions for the continuous time
BES-model. The numerical results show clearly that higher S values, i.e. values
such that more proximal terminal segments have higher branching rates than more
distal terminal segments, lead to more symmetrical trees as measured by three
tree symmetry indicators.Comment: 41 pages, 2 figures, revised structure and text improvement
- …