27,715 research outputs found
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
Joint Video and Text Parsing for Understanding Events and Answering Queries
We propose a framework for parsing video and text jointly for understanding
events and answering user queries. Our framework produces a parse graph that
represents the compositional structures of spatial information (objects and
scenes), temporal information (actions and events) and causal information
(causalities between events and fluents) in the video and text. The knowledge
representation of our framework is based on a spatial-temporal-causal And-Or
graph (S/T/C-AOG), which jointly models possible hierarchical compositions of
objects, scenes and events as well as their interactions and mutual contexts,
and specifies the prior probabilistic distribution of the parse graphs. We
present a probabilistic generative model for joint parsing that captures the
relations between the input video/text, their corresponding parse graphs and
the joint parse graph. Based on the probabilistic model, we propose a joint
parsing system consisting of three modules: video parsing, text parsing and
joint inference. Video parsing and text parsing produce two parse graphs from
the input video and text respectively. The joint inference module produces a
joint parse graph by performing matching, deduction and revision on the video
and text parse graphs. The proposed framework has the following objectives:
Firstly, we aim at deep semantic parsing of video and text that goes beyond the
traditional bag-of-words approaches; Secondly, we perform parsing and reasoning
across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG
representation; Thirdly, we show that deep joint parsing facilitates subsequent
applications such as generating narrative text descriptions and answering
queries in the forms of who, what, when, where and why. We empirically
evaluated our system based on comparison against ground-truth as well as
accuracy of query answering and obtained satisfactory results
An improved parser for data-oriented lexical-functional analysis
We present an LFG-DOP parser which uses fragments from LFG-annotated
sentences to parse new sentences. Experiments with the Verbmobil and Homecentre
corpora show that (1) Viterbi n best search performs about 100 times faster
than Monte Carlo search while both achieve the same accuracy; (2) the DOP
hypothesis which states that parse accuracy increases with increasing fragment
size is confirmed for LFG-DOP; (3) LFG-DOP's relative frequency estimator
performs worse than a discounted frequency estimator; and (4) LFG-DOP
significantly outperforms Tree-DOP is evaluated on tree structures only.Comment: 8 page
Iterated learning and grounding: from holistic to compositional languages
This paper presents a new computational model for studying the origins and evolution of compositional languages grounded through the interaction between agents and their environment. The model is based on previous work on adaptive grounding of lexicons and the iterated learning model. Although the model is still in a developmental phase, the first results show that a compositional language can emerge in which the structure reflects regularities present in the population's environment
Treo: Textual Syntax for Reo Connectors
Reo is an interaction-centric model of concurrency for compositional
specification of communication and coordination protocols. Formal verification
tools exist to ensure correctness and compliance of protocols specified in Reo,
which can readily be (re)used in different applications, or composed into more
complex protocols. Recent benchmarks show that compiling such high-level Reo
specifications produces executable code that can compete with or even beat the
performance of hand-crafted programs written in languages such as C or Java
using conventional concurrency constructs.
The original declarative graphical syntax of Reo does not support intuitive
constructs for parameter passing, iteration, recursion, or conditional
specification. This shortcoming hinders Reo's uptake in large-scale practical
applications. Although a number of Reo-inspired syntax alternatives have
appeared in the past, none of them follows the primary design principles of
Reo: a) declarative specification; b) all channel types and their sorts are
user-defined; and c) channels compose via shared nodes. In this paper, we offer
a textual syntax for Reo that respects these principles and supports flexible
parameter passing, iteration, recursion, and conditional specification. In
on-going work, we use this textual syntax to compile Reo into target languages
such as Java, Promela, and Maude.Comment: In Proceedings MeTRiD 2018, arXiv:1806.0933
GF-DOP: grammatical feature data-oriented parsing
This paper proposes an extension of Tree-DOP which approximates the LFG-DOP model. GF-DOP combines the robustness of the DOP model with some of the linguistic competence of LFG. LFG c-structure trees are augmented with LFG functional information, with the aim of (i) generating
more informative parses than Tree-DOP; (ii) improving overall parse ranking by modelling grammatical features; and (iii) avoiding the inconsistent probability models of LFG-DOP. In a number of experiments on the HomeCentre corpus, we report on which (groups of) features most heavily influence parse quality, both positively and negatively
- …