Search CORE

42 research outputs found

Data-Oriented Parsing with Discontinuous Constituents and Function Tags

Author: Bod R.
Scha R.
van Cranenburgh A.
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date: 01/01/2016
Field of study

Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch

UvA-DARE

Data-Oriented Parsing with discontinuous constituents and function tags

Author: Bod Rens
Scha Remko
van Cranenburgh Andreas
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date: 01/01/2016
Field of study

Proceedings - University of Groningen

Directory of Open Access Journals

Dissertations of the University of Groningen

A Data-Oriented Model of Literary Language

Author: Bod Rens
van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2017
Field of study

We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page

arXiv.org e-Print Archive

Proceedings - University of Groningen

UvA-DARE

Dissertations of the University of Groningen

LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible

Author: Kalkman Gino
Naaijer Martijn
Roorda Dirk
van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2014
Field of study

The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an extension to process the Hebrew Bible in particular. We first walk through the history of the Hebrew Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric may serve as an analysis tool for this corpus. Finally, we describe three analytic projects/workflows that benefit from the new LAF representation: 1) the study of linguistic variation: extract cooccurrence data of common nouns between the books of the Bible (Martijn Naaijer); 2) the study of the grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman); 3) construction of a parser of classical Hebrew by Data Oriented Parsing: generate tree structures from the database (Andreas van Cranenburgh)

arXiv.org e-Print Archive

UvA-DARE

A Data-Oriented Model of Literary Language

Author: Bod Rens
van Cranenburgh Andreas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study