Search CORE

100 research outputs found

Efficient parsing with linear context-free rewriting systems

Author: van Cranenburgh A.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2012
Field of study

International Migration, Integration and Social Cohesion online publications

Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies

Author: Crescenzi Pierluigi
Gildea Daniel
Marino Andrea
Rossi Gianluca
Satta Giorgio
Publication venue
Publication date: 25/11/2013
Field of study

Synchronous Context-Free Grammars (SCFGs), also known as syntax-directed translation schemata, are unlike context-free grammars in that they do not have a binary normal form. In general, parsing with SCFGs takes space and time polynomial in the length of the input strings, but with the degree of the polynomial depending on the permutations of the SCFG rules. We consider linear parsing strategies, which add one nonterminal at a time. We show that for a given input permutation, the problems of finding the linear parsing strategy with the minimum space and time complexity are both NP-hard

arXiv.org e-Print Archive

CiteSeerX

Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication

Author: Cohen Shay
Gildea Daniel
Publication venue: 'MIT Press - Journals'
Publication date: 08/03/2016
Field of study

We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time

O(n^{\omega d})

where

M(m) = O(m^{\omega})

is the running time for

m \times m

matrix multiplication and

d

is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to get a recognition algorithm for general binary LCFRS with running time

O(n^{\omega d + 1})

. The currently best known

\omega

is smaller than

2.38

. Our result provides another proof for the best known result for parsing mildly context sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars, which can be parsed in time

O(n^{4.76})

. It also shows that inversion transduction grammars can be parsed in time

O(n^{5.76})

. In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing

arXiv.org e-Print Archive

Edinburgh Research Explorer

Data-Oriented Parsing with discontinuous constituents and function tags

Author: Bod Rens
Scha Remko
van Cranenburgh Andreas
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date: 01/01/2016
Field of study

Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

Dissertations of the University of Groningen

Data-Oriented Parsing with Discontinuous Constituents and Function Tags

Author: Bod R.
Scha R.
van Cranenburgh A.
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date: 01/01/2016
Field of study

Biblioteka Nauki - repozytorium artykuÅÃ³w

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Optimal rank reduction for Linear Context-Free Rewriting Systems with Fan-Out Two

Author: Sagot Benoît
Satta Giorgio
Publication venue: HAL CCSD
Publication date: 11/07/2010
Field of study

International audienceLinear Context-Free Rewriting Systems (LCFRSs) are a grammar formalism capable of modeling discontinuous phrases. Many parsing applications use LCFRSs where the fan-out (a measure of the discontinuity of phrases) does not exceed 2. We present an efﬁcient algorithm for optimal reduction of the length of production right-hand side in LCFRSs with fan-out at most 2. This results in asymptotical running time improvement for known parsing algorithms for this class

INRIA a CCSD electronic archive server

Neural Combinatory Constituency Parsing

Author: CHEN Zhousi
チンチュウシ
陳宙斯
Publication venue
Publication date: 25/03/2023
Field of study

東京都立大学Tokyo Metropolitan University博士（情報科学）doctoral thesi

Tokyo Metropolitan University Institutional Repository Miyako-Dori / 首都大学東京機関リポジトリ

Span-Based LCFRS-2 Parsing

Author: Stanojević Miloš
Steedman Mark
Publication venue
Publication date: 01/01/2020
Field of study

Crossref

Edinburgh Research Explorer

Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2012
Field of study

Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate

CiteSeerX

Crossref

Edinburgh Research Explorer