Search CORE

6,932 research outputs found

Data-Oriented Language Processing. An Overview

Author: Bod Rens
Scha Remko
Publication venue
Publication date: 01/01/1996
Field of study

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

arXiv.org e-Print Archive

CiteSeerX

Faster Algorithms for the Maximum Common Subtree Isomorphism Problem

Author: Droschinsky Andre
Kriege Nils M.
Mutzel Petra
Publication venue
Publication date: 01/01/2016
Field of study

The maximum common subtree isomorphism problem asks for the largest possible isomorphism between subtrees of two given input trees. This problem is a natural restriction of the maximum common subgraph problem, which is

{\sf NP}

-hard in general graphs. Confining to trees renders polynomial time algorithms possible and is of fundamental importance for approaches on more general graph classes. Various variants of this problem in trees have been intensively studied. We consider the general case, where trees are neither rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on the mapped vertices and edges. For trees of order

n

and maximum degree

\Delta

our algorithm achieves a running time of

\mathcal{O}(n^2\Delta)

by exploiting the structure of the matching instances arising as subproblems. Thus our algorithm outperforms the best previously known approaches. No faster algorithm is possible for trees of bounded degree and for trees of unbounded degree we show that a further reduction of the running time would directly improve the best known approach to the assignment problem. Combining a polynomial-delay algorithm for the enumeration of all maximum common subtree isomorphisms with central ideas of our new algorithm leads to an improvement of its running time from

\mathcal{O}(n^6+Tn^2)

\mathcal{O}(n^3+Tn\Delta)

, where

n

is the order of the larger tree,

T

is the number of different solutions, and

\Delta

is the minimum of the maximum degrees of the input trees. Our theoretical results are supplemented by an experimental evaluation on synthetic and real-world instances

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Stochastic Continuous Time Neurite Branching Models with Tree and Segment Dependent Rates

Author: Ascoli
Ascoli
Athreya
Bell
Cardanobile
Chornoboy
Cox
de Reffye
Dehling
Devaud
Dityatev
Eberhard
Godin
Graham
Hawkes
Hely
Hentschel
Hentschel
Horton
Johnson
Kiddie
Kimmel
Kliemann
Koene
Nowakowski
Rall
Ronald A.J. van Elburg
Samsonovich
Shreve
Sismilich
Torben-Nielsen
Uemura
van Elburg
van Pelt
van Pelt
van Pelt
van Pelt
van Pelt
van Pelt
van Pelt
van Pelt
van Veen
Veen
Verwer
Villacorta
Zhuang
Zubler
Publication venue: 'Elsevier BV'
Publication date: 10/01/2011
Field of study

In this paper we introduce a continuous time stochastic neurite branching model closely related to the discrete time stochastic BES-model. The discrete time BES-model is underlying current attempts to simulate cortical development, but is difficult to analyze. The new continuous time formulation facilitates analytical treatment thus allowing us to examine the structure of the model more closely. We derive explicit expressions for the time dependent probabilities p(\gamma, t) for finding a tree \gamma at time t, valid for arbitrary continuous time branching models with tree and segment dependent branching rates. We show, for the specific case of the continuous time BES-model, that as expected from our model formulation, the sums needed to evaluate expectation values of functions of the terminal segment number \mu(f(n),t) do not depend on the distribution of the total branching probability over the terminal segments. In addition, we derive a system of differential equations for the probabilities p(n,t) of finding n terminal segments at time t. For the continuous BES-model, this system of differential equations gives direct numerical access to functions only depending on the number of terminal segments, and we use this to evaluate the development of the mean and standard deviation of the number of terminal segments at a time t. For comparison we discuss two cases where mean and variance of the number of terminal segments are exactly solvable. Then we discuss the numerical evaluation of the S-dependence of the solutions for the continuous time BES-model. The numerical results show clearly that higher S values, i.e. values such that more proximal terminal segments have higher branching rates than more distal terminal segments, lead to more symmetrical trees as measured by three tree symmetry indicators.Comment: 41 pages, 2 figures, revised structure and text improvement

arXiv.org e-Print Archive

Proceedings - University of Groningen

Dissertations of the University of Groningen