Search CORE

10,972 research outputs found

Data-Oriented Language Processing. An Overview

Author: Bod Rens
Scha Remko
Publication venue
Publication date: 01/01/1996
Field of study

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

arXiv.org e-Print Archive

CiteSeerX

Formal Properties of XML Grammars and Languages

Author: Berstel Jean
Boasson Luc
Publication venue
Publication date: 01/01/2000
Field of study

XML documents are described by a document type definition (DTD). An XML-grammar is a formal grammar that captures the syntactic features of a DTD. We investigate properties of this family of grammars. We show that every XML-language basically has a unique XML-grammar. We give two characterizations of languages generated by XML-grammars, one is set-theoretic, the other is by a kind of saturation property. We investigate decidability problems and prove that some properties that are undecidable for general context-free languages become decidable for XML-languages. We also characterize those XML-grammars that generate regular XML-languages.Comment: 24 page

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Interprocedural Reachability for Flat Integer Programs

Author: A Miné
D Kroening
G Godoy
H Hojjat
L Bozzelli
M Bozga
M Bozga
M Luker
M Luker
M Minsky
P Ganty
PA Abdulla
PZ Revesz
R Alur
S Bardin
S Demri
S Ginsburg
Publication venue
Publication date: 11/06/2015
Field of study

We study programs with integer data, procedure calls and arbitrary call graphs. We show that, whenever the guards and updates are given by octagonal relations, the reachability problem along control flow paths within some language w1* ... wd* over program statements is decidable in Nexptime. To achieve this upper bound, we combine a program transformation into the same class of programs but without procedures, with an Np-completeness result for the reachability problem of procedure-less programs. Besides the program, the expression w1* ... wd* is also mapped onto an expression of a similar form but this time over the transformed program statements. Several arguments involving context-free grammars and their generative process enable us to give tight bounds on the size of the resulting expression. The currently existing gap between Np-hard and Nexptime can be closed to Np-complete when a certain parameter of the analysis is assumed to be constant.Comment: 38 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

Grammar-based Representation and Identification of Dynamical Systems

Author: gorn
khandelwal
koza
madár
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/11/2018
Field of study

In this paper we propose a novel approach to identify dynamical systems. The method estimates the model structure and the parameters of the model simultaneously, automating the critical decisions involved in identification such as model structure and complexity selection. In order to solve the combined model structure and model parameter estimation problem, a new representation of dynamical systems is proposed. The proposed representation is based on Tree Adjoining Grammar, a formalism that was developed from linguistic considerations. Using the proposed representation, the identification problem can be interpreted as a multi-objective optimization problem and we propose a Evolutionary Algorithm-based approach to solve the problem. A benchmark example is used to demonstrate the proposed approach. The results were found to be comparable to that obtained by state-of-the-art non-linear system identification methods, without making use of knowledge of the system description.Comment: Submitted to European Control Conference (ECC) 201

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

A Characterization of ET0L and EDT0L Languages

Author: Asveld Peter R.J.
Publication venue: Department of Applied Mathematics, University of Twente
Publication date: 01/01/1976
Field of study

There exists a PT0L language

L_0

such that the following holds. A language

L

is an ET0L language if and only if there exists a mapping

T

induced by an a-NGSM (nondeterministic generalized sequential machine with accepting states) such that

L = T(L_0)

. There exists an infinite collection of EPDT0L languages

D_{mn}\subseteq\Sigma_{mn}^\star

(

n\geq m\geq 1

) such that the family EDT0L is characterized in the following way. A language

L

is an EDT0L language if and only if there exists

n\geq m\geq 1

, a homomorphism

h

and a regular language

R \subseteq \Sigma_{mn}^\star

such that

L = h(D_{mn} \cap R)

.\u

University of Twente Research Information

Principles and Implementation of Deductive Parsing

Author: Pereira Fernando C. N.
Schabes Yves
Shieber Stuart M.
Publication venue
Publication date: 01/01/1994
Field of study

We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Harvard University - DASH

Leveraging Semantic Web Service Descriptions for Validation by Automated Functional Testing

Author: A. Bertolino
C. Keum
D. Dranidis
D. Kourtesis
D. Martin
D. Martin
F. Ipate
G.D. Plotkin
M. Holcombe
R. Heckel
S. Eilenberg
T.S. Chow
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Recent years have seen the utilisation of Semantic Web Service descriptions for automating a wide range of service-related activities, with a primary focus on service discovery, composition, execution and mediation. An important area which so far has received less attention is service validation, whereby advertised services are proven to conform to required behavioural specifications. This paper proposes a method for validation of service-oriented systems through automated functional testing. The method leverages ontology-based and rule-based descriptions of service inputs, outputs, preconditions and effects (IOPE) for constructing a stateful EFSM specification. The specification is subsequently utilised for functional testing and validation using the proven Stream X-machine (SXM) testing methodology. Complete functional test sets are generated automatically at an abstract level and are then applied to concrete Web services, using test drivers created from the Web service descriptions. The testing method comes with completeness guarantees and provides a strong method for validating the behaviour of Web services

Crossref

White Rose Research Online

The Computational Complexity of Symbolic Dynamics at the Onset of Chaos

Author: A. Dhar
B. Derrida
G. Chaitin
J. Crutchfield
J. Crutchfield
J. Crutchfield
J. Milnor
J.E. Hopcroft
L. Blum
M.J. Feigenbaum
P. Cariani
P. Collet
Porus Lakdawala
R. Badii
R.S. Mackay
S. Eilenberg
S. Wolfram
T. Hayashi
W. Ogden
Publication venue: 'American Physical Society (APS)'
Publication date: 17/08/1995
Field of study

In a variety of studies of dynamical systems, the edge of order and chaos has been singled out as a region of complexity. It was suggested by Wolfram, on the basis of qualitative behaviour of cellular automata, that the computational basis for modelling this region is the Universal Turing Machine. In this paper, following a suggestion of Crutchfield, we try to show that the Turing machine model may often be too powerful as a computational model to describe the boundary of order and chaos. In particular we study the region of the first accumulation of period doubling in unimodal and bimodal maps of the interval, from the point of view of language theory. We show that in relation to the ``extended'' Chomsky hierarchy, the relevant computational model in the unimodal case is the nested stack automaton or the related indexed languages, while the bimodal case is modeled by the linear bounded automaton or the related context-sensitive languages.Comment: 1 reference corrected, 1 reference added, minor changes in body of manuscrip

arXiv.org e-Print Archive

Crossref