10,972 research outputs found

    Data-Oriented Language Processing. An Overview

    Full text link
    During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

    Formal Properties of XML Grammars and Languages

    Full text link
    XML documents are described by a document type definition (DTD). An XML-grammar is a formal grammar that captures the syntactic features of a DTD. We investigate properties of this family of grammars. We show that every XML-language basically has a unique XML-grammar. We give two characterizations of languages generated by XML-grammars, one is set-theoretic, the other is by a kind of saturation property. We investigate decidability problems and prove that some properties that are undecidable for general context-free languages become decidable for XML-languages. We also characterize those XML-grammars that generate regular XML-languages.Comment: 24 page

    Interprocedural Reachability for Flat Integer Programs

    Full text link
    We study programs with integer data, procedure calls and arbitrary call graphs. We show that, whenever the guards and updates are given by octagonal relations, the reachability problem along control flow paths within some language w1* ... wd* over program statements is decidable in Nexptime. To achieve this upper bound, we combine a program transformation into the same class of programs but without procedures, with an Np-completeness result for the reachability problem of procedure-less programs. Besides the program, the expression w1* ... wd* is also mapped onto an expression of a similar form but this time over the transformed program statements. Several arguments involving context-free grammars and their generative process enable us to give tight bounds on the size of the resulting expression. The currently existing gap between Np-hard and Nexptime can be closed to Np-complete when a certain parameter of the analysis is assumed to be constant.Comment: 38 pages, 1 figur

    Grammar-based Representation and Identification of Dynamical Systems

    Get PDF
    In this paper we propose a novel approach to identify dynamical systems. The method estimates the model structure and the parameters of the model simultaneously, automating the critical decisions involved in identification such as model structure and complexity selection. In order to solve the combined model structure and model parameter estimation problem, a new representation of dynamical systems is proposed. The proposed representation is based on Tree Adjoining Grammar, a formalism that was developed from linguistic considerations. Using the proposed representation, the identification problem can be interpreted as a multi-objective optimization problem and we propose a Evolutionary Algorithm-based approach to solve the problem. A benchmark example is used to demonstrate the proposed approach. The results were found to be comparable to that obtained by state-of-the-art non-linear system identification methods, without making use of knowledge of the system description.Comment: Submitted to European Control Conference (ECC) 201

    A Characterization of ET0L and EDT0L Languages

    Get PDF
    There exists a PT0L language L0L_0 such that the following holds. A language LL is an ET0L language if and only if there exists a mapping TT induced by an a-NGSM (nondeterministic generalized sequential machine with accepting states) such that L=T(L0)L = T(L_0). There exists an infinite collection of EPDT0L languages DmnΣmnD_{mn}\subseteq\Sigma_{mn}^\star (nm1n\geq m\geq 1) such that the family EDT0L is characterized in the following way. A language LL is an EDT0L language if and only if there exists nm1n\geq m\geq 1, a homomorphism hh and a regular language RΣmnR \subseteq \Sigma_{mn}^\star such that L=h(DmnR)L = h(D_{mn} \cap R).\u

    Principles and Implementation of Deductive Parsing

    Get PDF
    We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

    Leveraging Semantic Web Service Descriptions for Validation by Automated Functional Testing

    Get PDF
    Recent years have seen the utilisation of Semantic Web Service descriptions for automating a wide range of service-related activities, with a primary focus on service discovery, composition, execution and mediation. An important area which so far has received less attention is service validation, whereby advertised services are proven to conform to required behavioural specifications. This paper proposes a method for validation of service-oriented systems through automated functional testing. The method leverages ontology-based and rule-based descriptions of service inputs, outputs, preconditions and effects (IOPE) for constructing a stateful EFSM specification. The specification is subsequently utilised for functional testing and validation using the proven Stream X-machine (SXM) testing methodology. Complete functional test sets are generated automatically at an abstract level and are then applied to concrete Web services, using test drivers created from the Web service descriptions. The testing method comes with completeness guarantees and provides a strong method for validating the behaviour of Web services

    The Computational Complexity of Symbolic Dynamics at the Onset of Chaos

    Full text link
    In a variety of studies of dynamical systems, the edge of order and chaos has been singled out as a region of complexity. It was suggested by Wolfram, on the basis of qualitative behaviour of cellular automata, that the computational basis for modelling this region is the Universal Turing Machine. In this paper, following a suggestion of Crutchfield, we try to show that the Turing machine model may often be too powerful as a computational model to describe the boundary of order and chaos. In particular we study the region of the first accumulation of period doubling in unimodal and bimodal maps of the interval, from the point of view of language theory. We show that in relation to the ``extended'' Chomsky hierarchy, the relevant computational model in the unimodal case is the nested stack automaton or the related indexed languages, while the bimodal case is modeled by the linear bounded automaton or the related context-sensitive languages.Comment: 1 reference corrected, 1 reference added, minor changes in body of manuscrip
    corecore