10,972 research outputs found
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
Formal Properties of XML Grammars and Languages
XML documents are described by a document type definition (DTD). An
XML-grammar is a formal grammar that captures the syntactic features of a DTD.
We investigate properties of this family of grammars. We show that every
XML-language basically has a unique XML-grammar. We give two characterizations
of languages generated by XML-grammars, one is set-theoretic, the other is by a
kind of saturation property. We investigate decidability problems and prove
that some properties that are undecidable for general context-free languages
become decidable for XML-languages. We also characterize those XML-grammars
that generate regular XML-languages.Comment: 24 page
Interprocedural Reachability for Flat Integer Programs
We study programs with integer data, procedure calls and arbitrary call
graphs. We show that, whenever the guards and updates are given by octagonal
relations, the reachability problem along control flow paths within some
language w1* ... wd* over program statements is decidable in Nexptime. To
achieve this upper bound, we combine a program transformation into the same
class of programs but without procedures, with an Np-completeness result for
the reachability problem of procedure-less programs. Besides the program, the
expression w1* ... wd* is also mapped onto an expression of a similar form but
this time over the transformed program statements. Several arguments involving
context-free grammars and their generative process enable us to give tight
bounds on the size of the resulting expression. The currently existing gap
between Np-hard and Nexptime can be closed to Np-complete when a certain
parameter of the analysis is assumed to be constant.Comment: 38 pages, 1 figur
Grammar-based Representation and Identification of Dynamical Systems
In this paper we propose a novel approach to identify dynamical systems. The
method estimates the model structure and the parameters of the model
simultaneously, automating the critical decisions involved in identification
such as model structure and complexity selection. In order to solve the
combined model structure and model parameter estimation problem, a new
representation of dynamical systems is proposed. The proposed representation is
based on Tree Adjoining Grammar, a formalism that was developed from linguistic
considerations. Using the proposed representation, the identification problem
can be interpreted as a multi-objective optimization problem and we propose a
Evolutionary Algorithm-based approach to solve the problem. A benchmark example
is used to demonstrate the proposed approach. The results were found to be
comparable to that obtained by state-of-the-art non-linear system
identification methods, without making use of knowledge of the system
description.Comment: Submitted to European Control Conference (ECC) 201
A Characterization of ET0L and EDT0L Languages
There exists a PT0L language such that the following holds. A language is an ET0L language if and only if there exists a mapping induced by an a-NGSM (nondeterministic generalized sequential machine with accepting states) such that . There exists an infinite collection of EPDT0L languages () such that the family EDT0L is characterized in the following way. A language is an EDT0L language if and only if there exists , a homomorphism and a regular language such that .\u
Principles and Implementation of Deductive Parsing
We present a system for generating parsers based directly on the metaphor of
parsing as deduction. Parsing algorithms can be represented directly as
deduction systems, and a single deduction engine can interpret such deduction
systems so as to implement the corresponding parser. The method generalizes
easily to parsers for augmented phrase structure formalisms, such as
definite-clause grammars and other logic grammar formalisms, and has been used
for rapid prototyping of parsing algorithms for a variety of formalisms
including variants of tree-adjoining grammars, categorial grammars, and
lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
Leveraging Semantic Web Service Descriptions for Validation by Automated Functional Testing
Recent years have seen the utilisation of Semantic Web Service descriptions for automating a wide range of service-related activities, with a primary focus on service discovery, composition, execution and mediation. An important area which so far has received less attention is service validation, whereby advertised services are proven to conform to required behavioural specifications. This paper proposes a method for validation of service-oriented systems through automated functional testing. The method leverages ontology-based and rule-based descriptions of service inputs, outputs, preconditions and effects (IOPE) for constructing a stateful EFSM specification. The specification is subsequently utilised for functional testing and validation using the proven Stream X-machine (SXM) testing methodology. Complete functional test sets are generated automatically at an abstract level and are then applied to concrete Web services, using test drivers created from the Web service descriptions. The testing method comes with completeness guarantees and provides a strong method for validating the behaviour of Web services
The Computational Complexity of Symbolic Dynamics at the Onset of Chaos
In a variety of studies of dynamical systems, the edge of order and chaos has
been singled out as a region of complexity. It was suggested by Wolfram, on the
basis of qualitative behaviour of cellular automata, that the computational
basis for modelling this region is the Universal Turing Machine. In this paper,
following a suggestion of Crutchfield, we try to show that the Turing machine
model may often be too powerful as a computational model to describe the
boundary of order and chaos. In particular we study the region of the first
accumulation of period doubling in unimodal and bimodal maps of the interval,
from the point of view of language theory. We show that in relation to the
``extended'' Chomsky hierarchy, the relevant computational model in the
unimodal case is the nested stack automaton or the related indexed languages,
while the bimodal case is modeled by the linear bounded automaton or the
related context-sensitive languages.Comment: 1 reference corrected, 1 reference added, minor changes in body of
manuscrip
- …