Search CORE

3 research outputs found

On Dependency Analysis via Contractions and Weighted FSTs

Author: Yli-Jyrä Anssi Mikael
Publication venue: Springer-Verlag
Publication date: 01/01/2012
Field of study

Arc contractions in syntactic dependency graphs can be used to decide which graphs are trees. The paper observes that these contractions can be expressed with weighted finite-state transducers (weighted FST) that operate on string-encoded trees. The observation gives rise to a finite-state parsing algorithm that computes the parse forest and extracts the best parses from it. The algorithm is customizable to functional and bilexical dependency parsing, and it can be extended to non-projective parsing via a multi-planar encoding with prior results on high recall. Our experiments support an analysis of projective parsing according to which the worst-case time complexity of the algorithm is quadratic to the sentence length, and linear to the overlapping arcs and the number of functional categories of the arcs. The results suggest several interesting directions towards efficient and highprecision dependency parsing that takes advantage of the flexibility and the demonstrated ambiguity-packing capacity of such a parser.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Viable Dependency Parsing as Sequence Labeling

Author: Gómez-Rodríguez Carlos
Strzyz Michalina
Vilares David
Publication venue
Publication date: 01/01/2019
Field of study

We recast dependency parsing as a sequence labeling problem, exploring several encodings of dependency trees as labels. While dependency parsing by means of sequence labeling had been attempted in existing work, results suggested that the technique was impractical. We show instead that with a conventional BiLSTM-based model it is possible to obtain fast and accurate parsers. These parsers are conceptually simple, not needing traditional parsing algorithms or auxiliary structures. However, experiments on the PTB and a sample of UD treebanks show that they provide a good speed-accuracy tradeoff, with results competitive with more complex approaches.Comment: Camera-ready version to appear at NAACL 2019 (final peer-reviewed manuscript). 8 pages (incl. appendix

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

Crossref

A Linearization Framework for Dependency and Constituent Trees

Author: Roca Rodríguez Diego
Publication venue
Publication date: 01/01/2022
Field of study

[Abstract]: Parsing is a core natural language processing problem in which, given an input raw sentence, a model automatically produces a structured output that represents its syntactic structure. The most common formalisms in this field are constituent and dependency parsing. Although both formalisms show differences, they also share limitations, in particular the limited speed of the models to obtain the desired representation, and the lack of a common representation that allows any end-to-end neural system to obtain those models. Transforming both parsing tasks into a sequence labeling task solves both of these problems. Several tree linearizations have been proposed in the last few years, however there is no common suite that facilitates their use under an integrated framework. In this work, we will develop such a system. On the one hand, the system will be able to: (i) encode syntactic trees according to the desired syntactic formalism and linearization function, and (ii) decode linearized trees into their original representation. On the other hand, (iii) we will also train several neural sequence labeling systems to perform parsing from those labels, and we will compare the results.[Resumen]: El análisis sintáctico es una tarea central dentro del procesado del lenguaje natural, en el que dada una oración se produce una salida que representa su estructura sintáctica. Los formalismos más populares son el de constituyentes y el de dependencias. Aunque son fundamentalmente diferentes, tienen ciertas limitaciones en común, como puede ser la lentitud de los modelos empleados para su predicción o la falta de una representación común que permita predecirlos con sistemas neuronales de uso general. Transformar ambos formalismos a una tarea de etiquetado de secuencias permite resolver ambos problemas. Durante los últimos años se han propuesto diferentes maneras de linearizar árboles sintácticos, pero todavía se carecía de un software unificado que permitiese obtener representaciones para ambos formalismos sobre un mismo sistema. En este trabajo se desarrollará dicho sistema. Por un lado, éste permitirá: (i) linearizar árboles sintácticos en el formalismo y función de linearización deseadas y (ii) decodificar árboles linearizados de vuelta a su formato original. Por otro lado, también se entrenarán varios modelos de etiquetado de secuencias, y se compararán los resultados obtenidos.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2021/202

Repositorio da Universidade da Coruña