6,932 research outputs found

    Data-Oriented Language Processing. An Overview

    Full text link
    During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

    Faster Algorithms for the Maximum Common Subtree Isomorphism Problem

    Get PDF
    The maximum common subtree isomorphism problem asks for the largest possible isomorphism between subtrees of two given input trees. This problem is a natural restriction of the maximum common subgraph problem, which is NP{\sf NP}-hard in general graphs. Confining to trees renders polynomial time algorithms possible and is of fundamental importance for approaches on more general graph classes. Various variants of this problem in trees have been intensively studied. We consider the general case, where trees are neither rooted nor ordered and the isomorphism is maximum w.r.t. a weight function on the mapped vertices and edges. For trees of order nn and maximum degree Δ\Delta our algorithm achieves a running time of O(n2Δ)\mathcal{O}(n^2\Delta) by exploiting the structure of the matching instances arising as subproblems. Thus our algorithm outperforms the best previously known approaches. No faster algorithm is possible for trees of bounded degree and for trees of unbounded degree we show that a further reduction of the running time would directly improve the best known approach to the assignment problem. Combining a polynomial-delay algorithm for the enumeration of all maximum common subtree isomorphisms with central ideas of our new algorithm leads to an improvement of its running time from O(n6+Tn2)\mathcal{O}(n^6+Tn^2) to O(n3+TnΔ)\mathcal{O}(n^3+Tn\Delta), where nn is the order of the larger tree, TT is the number of different solutions, and Δ\Delta is the minimum of the maximum degrees of the input trees. Our theoretical results are supplemented by an experimental evaluation on synthetic and real-world instances

    Stochastic Continuous Time Neurite Branching Models with Tree and Segment Dependent Rates

    Full text link
    In this paper we introduce a continuous time stochastic neurite branching model closely related to the discrete time stochastic BES-model. The discrete time BES-model is underlying current attempts to simulate cortical development, but is difficult to analyze. The new continuous time formulation facilitates analytical treatment thus allowing us to examine the structure of the model more closely. We derive explicit expressions for the time dependent probabilities p(\gamma, t) for finding a tree \gamma at time t, valid for arbitrary continuous time branching models with tree and segment dependent branching rates. We show, for the specific case of the continuous time BES-model, that as expected from our model formulation, the sums needed to evaluate expectation values of functions of the terminal segment number \mu(f(n),t) do not depend on the distribution of the total branching probability over the terminal segments. In addition, we derive a system of differential equations for the probabilities p(n,t) of finding n terminal segments at time t. For the continuous BES-model, this system of differential equations gives direct numerical access to functions only depending on the number of terminal segments, and we use this to evaluate the development of the mean and standard deviation of the number of terminal segments at a time t. For comparison we discuss two cases where mean and variance of the number of terminal segments are exactly solvable. Then we discuss the numerical evaluation of the S-dependence of the solutions for the continuous time BES-model. The numerical results show clearly that higher S values, i.e. values such that more proximal terminal segments have higher branching rates than more distal terminal segments, lead to more symmetrical trees as measured by three tree symmetry indicators.Comment: 41 pages, 2 figures, revised structure and text improvement
    • …
    corecore