Search CORE

8,409 research outputs found

Tabular Parsing

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2004
Field of study

This is a tutorial on tabular parsing, on the basis of tabulation of nondeterministic push-down automata. Discussed are Earley's algorithm, the Cocke-Kasami-Younger algorithm, tabular LR parsing, the construction of parse trees, and further issues.Comment: 21 pages, 14 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Probabilistic Parsing Strategies

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2002
Field of study

We present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counter-parts. Such parsing strategies are seen as constructions of push-down devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results we also derive negative results on so-called generalized LR parsing.Comment: 36 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Padova

A Variant of Earley Parsing

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/1997
Field of study

The Earley algorithm is a widely used parsing method in natural language processing applications. We introduce a variant of Earley parsing that is based on a ``delayed'' recognition of constituents. This allows us to start the recognition of a constituent only in cases in which all of its subconstituents have been found within the input string. This is particularly advantageous in several cases in which partial analysis of a constituent cannot be completed and in general in all cases of productions sharing some suffix of their right-hand sides (even for different left-hand side nonterminals). Although the two algorithms result in the same asymptotic time and space complexity, from a practical perspective our algorithm improves the time and space requirements of the original method, as shown by reported experimental results.Comment: 12 pages, 1 Postscript figure, uses psfig.tex and llncs.st

arXiv.org e-Print Archive

CiteSeerX

University of Groningen Digital Archive

Archivio istituzionale della ricerca - Università di Padova

Efficient Tabular LR Parsing

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/1996
Field of study

We give a new treatment of tabular LR parsing, which is an alternative to Tomita's generalized LR algorithm. The advantage is twofold. Firstly, our treatment is conceptually more attractive because it uses simpler concepts, such as grammar transformations and standard tabulation techniques also know as chart parsing. Secondly, the static and dynamic complexity of parsing, both in space and time, is significantly reduced.Comment: 8 pages, uses aclap.st

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Archivio istituzionale della ricerca - Università di Padova

Dissertations of the University of Groningen

Splittability of bilexical context-free grammars is undecidable

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: 'MIT Press - Journals'
Publication date: 01/12/2011
Field of study

Bilexical context-free grammars (2-LCFGs) have proved to be accurate models for statistical natural language parsing. Existing dynamic programming algorithms used to parse sentences under these models have running time of O(|w|^4), where w is the input string. A 2-LCFG is splittable if the left arguments of a lexical head are always independent of the right arguments, and vice versa. When a 2-LCFGs is splittable, parsing time can be asymptotically improved to O(|w|^3). Testing this propertyis therefore of central interest to parsing efficiency. In this article, however, we show the negative result that splittability of 2-LCFGs is undecidable.Publisher PDFPeer reviewe

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Padova

University of St. Andrews - Pure

St Andrews Research Repository

Computation of distances for regular and context-free probabilistic languages

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2008
Field of study

Several mathematical distances between probabilistic languages have been investigated in the literature, motivated by applications in language modeling, computational biology, syntactic pattern matching and machine learning. In most cases, only pairs of probabilistic regular languages were considered. In this paper we extend the previous results to pairs of languages generated by a probabilistic context-free grammar and a probabilistic finite automaton.PostprintPeer reviewe

Elsevier - Publisher Connector

Crossref

Archivio istituzionale della ricerca - Università di Padova

University of St. Andrews - Pure

St Andrews Research Repository

Probabilistic parsing

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: Springer
Publication date: 06/01/2011
Field of study

Postprin

St Andrews Research Repository

Parsing with CYK over Distributed Representations

Author: Cristini Giordano
Satta Giorgio
Zanzotto Fabio Massimo
Publication venue
Publication date: 17/04/2019
Field of study

Syntactic parsing is a key task in natural language processing. This task has been dominated by symbolic, grammar-based parsers. Neural networks, with their distributed representations, are challenging these methods. In this article we show that existing symbolic parsing algorithms can cross the border and be entirely formulated over distributed representations. To this end we introduce a version of the traditional Cocke-Younger-Kasami (CYK) algorithm, called D-CYK, which is entirely defined over distributed representations. Our D-CYK uses matrix multiplication on real number matrices of size independent of the length of the input string. These operations are compatible with traditional neural networks. Experiments show that our D-CYK approximates the original CYK algorithm. By showing that CYK can be entirely performed on distributed representations, we open the way to the definition of recurrent layers of CYK-informed neural networks.Comment: The algorithm has been greatly improved. Experiments have been redesigne

arXiv.org e-Print Archive

ART

Archivio istituzionale della ricerca - Università di Padova