Search CORE

98 research outputs found

On TAG and Multicomponent TAG Parsing

Author: Boullier Pierre
Publication venue: HAL CCSD
Publication date: 01/01/1999
Field of study

The notion of mild context-sensitivity is an attempt to express the formal power needed to define the syntax of natural languages. However, all incarnati- ons of mildly context-sensitive formalisms are not equivalent. On the one hand, near the bottom of the hierarchy, we find tree adjoining grammars and, on the other hand, near the top of the hierarchy, we find multicomponent tree adjoining grammars. This paper proposes a polynomial parse time method for these two tree rewriting formalisms. This method uses range concatenation grammars as a high-level intermediate definition formalism, and yields several algorithms. Range concatenation grammar is a syntactic formalism which is both powerful, in so far as it extends linear context-free rewriting systems, and efficient, in so far as its sentences can be parsed in polynomial time. We show that any unrestricted tree adjoining grammar can be transformed into an equivalent range concatenation grammar which can be parsed in O(n6) time, and, moreover, if the input tree adjoining grammar has some restricted form, its parse time decreases to O(n5). We generalize one of these algorithms in order to process multicomponent tree adjoining grammars. We show some upper bounds on their parse times, and we introduce a hierarchy of restricted forms which can be parsed more efficiently. Our approach aims at giving both a new insight into the multicomponent adjunction mechanism and at providing a practical implementation scheme

INRIA a CCSD electronic archive server

Yet Another ${\cal O}(n^6)$ Recognition Algorithm for Mildly Context-Sensitive Languages

Author: Boullier Pierre
Publication venue: HAL CCSD
Publication date: 01/01/1995
Field of study

Vijay-Shanker and Weir have shown in \cite{VSW94} that Tree Adjoining Grammars and Combinatory Categorial Grammars can be transformed into equivalent Linear Indexed Grammars (LIGs) which can be recognized in

{\cal O}(n^6)

time using a Cocke-Kasami-Younger style algorithm. This paper exhibits another recognition algorithm for LIGs, with the same upper-bound complexity, but whose average case behaves much better. This algorithm works in two steps: first a general context-free parsing algorithm (using the underlying context-free grammar) builds a shared parse forest, and second, the LIG properties are checked on this forest. This check is based upon the composition of simple relations and does not require any computation of symbol stacks

INRIA a CCSD electronic archive server

Dynamic grammars and semantic analysis

Author: Boullier Pierre
Publication venue: HAL CCSD
Publication date: 01/01/1994
Field of study

Projet CHLOEWe define a dynamic grammar as a device which may generate an unbounded set of context-free grammars, each grammar is produced, while parsing a source text, by the recognition of some construct. It is shown that dynamic grammars have the formal power of Turing machines. For a given source text, a dynamic grammar, when non ambiguous, may be seen as a sequence of usual context-free grammars specialized by this source text: an initial grammar is modified, little by little, while the program is parsed and is used to continue the parsing process. An experimental system which implements a non ambiguous \sl dynamic parser is sketched and applications of this system for the resolution of some semantic analysis problems are shown. Some of these examples are non-trivial (overloading resolution, derived types, polymorphism, \ldots) and indicate that this method may partly compete with other well-known techniques used in type-checking

INRIA a CCSD electronic archive server

Another Facet of LIG Parsing (extended version)

Author: Boullier Pierre
Publication venue: HAL CCSD
Publication date: 01/01/1996
Field of study

In this paper we present a new parsing algorithm for linear indexed grammars (LIGs) in the same spirit as the one described in (Vijay-Shanker et Weir, 1993) for tree adjoining grammars. For a LIG

L

and an input string

x

of length

n

, we build a non ambiguous context-free grammar whose sentences are all (and exclusively) valid derivation sequences in

L

which lead to

x

. We show that this grammar can be built in

{\cal O}(n^6)

time and that individual parses can be extracted in linear time with the size of the extracted parse tree. Though this

{\cal O}(n^6)

upper bound does not improve over previous results, the average case behaves much better. Moreover, practical parsing times can be decreased by some statically performed computations

INRIA a CCSD electronic archive server

A Cubic Time Extension of Context-Free Grammars

Author: Boullier Pierre
Publication venue: HAL CCSD
Publication date: 01/01/1999
Field of study

Context-free grammars and cubic time parsing are so related in people's minds that they often think that parsing any extension of context-free grammars must need some extra time. Of course, this is not true and this paper presents a generalization of context-free grammars which keeps a cubic time complexity. This extension, which defines a sub-class of context-sensitive languages, has both a theoretical and a practical interest. The class of languages defined by these grammars is closed under both intersection and complementation (in fact it is the class containing the intersection and the complementation of context-free languages). On the other hand, these grammars can be considered as being mildly context-sensitive and can therefore be used in natural language processing

INRIA a CCSD electronic archive server

Proposal for a Natural Language Processing Syntactic Backbone

Author: Boullier Pierre
Publication venue: HAL CCSD
Publication date: 01/01/1998
Field of study

The purpose of this paper is to present a grammatical formalism that extends context-free grammars and aims at being a convincing challenger as a syntactic base for various tasks, especially in natural language processing. These grammars are powerful, they strictly include mildly context-sensitive languages, while staying computationally tractable, since sentences are parsed in polynomial time. Moreover, this formalism allows a form of modularit- y which may lead to the design of libraries of reusable generic grammatical components. And, last, it can act as a syntactic backbone upon which decoratio- ns from other domains (say feature structures) can be grafted

INRIA a CCSD electronic archive server

Multi-Component Tree Insertion Grammars

Author: Boullier Pierre
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceIn this paper we introduce a new mildly context sensitive formalism called Multi-Component Tree Insertion Grammar. This formalism is a generalization of Tree Insertion Grammars in the same sense that Multi-Component Tree Adjoining Grammars is a generalization of Tree Adjoining Grammars. We show that this class of grammatical formalisms is equivalent to Multi-Component Tree Adjoining Grammars, and that it also defines a hierarchy of languages whose supplementary formal power between two increasing levels is more gently delivered than the one given by Multi-Component Tree Adjoining Grammars. We show that Multi-Component Tree Insertion Grammars and simple Range Concatenation Grammars are equivalent and we show how to transform a grammar of one type into an equivalent grammar of the other type. Such a transformation gives a method to build efficient parsers for Multi-Component Tree Insertion Languages

INRIA a CCSD electronic archive server

Hal-Diderot

Efficient LFG parsing: SxLfg

Author: Boullier Pierre
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

International audienceIn this paper, we introduce a new parser, called SxLfg, based on the Lexical-Functional Grammars formalism (LFG). We describe the underlying context-free parser and how functional structures are efficiently computed on top of the CFG shared forest thanks to computation sharing, lazy evaluation, and compact data representation. We then present various error recovery techniques we implemented in order to build a robust parser. Finally, we offer concrete results when SxLfg is used with an existing grammar for French. We show that our parser is both efficient and robust, although the grammar is very ambiguous

INRIA a CCSD electronic archive server

SxPipe 2: architecture pour le traitement pré-syntaxique de corpus bruts

Author: Boullier Pierre
Sagot Benoît
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 01/01/2008
Field of study

International audienceCet article présente SxPipe 2, chaîne modulaire et paramétrable dont le rôle est d'appliquer à des corpus bruts une cascade de traitements de surface. Préalable nécessaire à une possible analyse syntaxique, ils peuvent également servir à préparer d'autres tâches. Développé pour le français mais également pour d'autres langues, SxPipe 2 comprend, entre autres, divers modules de reconnaissances d'entités nommées dans du texte brut, un segmenteur en phrases et en tokens, un correcteur orthographique et reconnaisseur de mots composés, ainsi qu'une architecture originale de reconnaissance de motifs non contextuels, utilisée par différentes grammaires spécialisées (nombres, constructions impersonnelles...). Nous présentons les fondements théoriques des différents modules, leur mise en œuvre pour le français et pour certains une évaluation quantitative

INRIA a CCSD electronic archive server

Hal-Diderot

Parsing Directed Acyclic Graphs with Range Concatenation Grammars

Author: Boullier Pierre
Sagot Benoît
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceRange Concatenation Grammars (RCGs) are a syntactic formalism which possesses many attractive properties. It is more powerful than Linear Context-Free Rewriting Systems, though this power is not reached to the detriment of efficiency since its sentences can always be parsed in polynomial time. If the input, instead of a string, is a Directed Acyclic Graph (DAG), only simple RCGs can still be parsed in polynomial time. For non-linear RCGs, this polynomial parsing time cannot be guaranteed anymore. In this paper, we show how the standard parsing algorithm can be adapted for parsing DAGs with RCGs, both in the linear (simple) and in the non-linear case

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot