Search CORE

9 research outputs found

Crossings as a side effect of dependency lengths

Author: Bick
Christensen
Conover
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Ferrer-i-Cancho
Futrell
Gibson
Gildea
Gildea
Gómez-Rodríguez
Hays
Hochberg
Hudson
Iwatate
Jiang
Kawata
Kelih
Liu
Lu
Newman
Poirier
Popper
Prokhorov
Ramasamy
Tanaka
Temperley
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

The syntactic structure of sentences exhibits a striking regularity: dependencies tend to not cross when drawn above the sentence. We investigate two competing explanations. The traditional hypothesis is that this trend arises from an independent principle of syntax that reduces crossings practically to zero. An alternative to this view is the hypothesis that crossings are a side effect of dependency lengths, i.e. sentences with shorter dependency lengths should tend to have fewer crossings. We are able to reject the traditional view in the majority of languages considered. The alternative hypothesis can lead to a more parsimonious theory of language.Comment: the discussion section has been expanded significantly; in press in Complexity (Wiley

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

A Dependency Treebank for Telugu

Author: Rama Taraka
Vajjala Sowmya
Vajjala Sowmya
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2018
Field of study

In this paper, we describe the annotation and development of Telugu treebank following the Universal Dependencies framework. We manually annotated 1328 sentences from a Telugu grammar textbook and the treebank is freely available from Universal Dependencies version 2.1.1 In this paper, we discuss some language specific annotation issues and decisions; and report preliminary experiments with POS tagging and dependency parsing. To the best of our knowledge, this is the first freely accessible and open dependency treebank for Telugu

Digital Repository @ Iowa State University (ISU)

HamleDT 2.0: Thirty Dependency Treebanks Stanfordized

Author: Mareček David
Mašek Jan
Popel Martin
Rosa Rudolf
Zeman Daniel
Žabokrtský Zdeněk
Publication venue
Publication date: 01/01/2014
Field of study

We present HamleDT 2.0 (HArmonized Multi-LanguagE Dependency Treebank). HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes. We describe both of the annotation styles, including adjustments that were necessary to make, and provide details about the conversion process. We also discuss the differences between the two styles, evaluating their advantages and disadvantages, and note the effects of the differences on the conversion. We regard the stanfordization as generally successful, although we admit several shortcomings, especially in the distinction between direct and indirect objects, that have to be addressed in future. We release part of HamleDT 2.0 freely; we are not allowed to redistribute the whole dataset, but we do provide the conversion pipeline

Biblio at Institute of Formal and Applied Linguistics

Genre as Weak Supervision for Cross-lingual Dependency Parsing

Author: Müller-Eberstein Maximilian
Plank Barbara
van der Goot Rob
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/11/2021
Field of study

The IT University of Copenhagen's Repository

CLARA: A New Generation of Researchers in Common Language Resources and Their Applications

Author: Koenraad De Smedt Erhard Hinrichs, Detmar Meurers, Inguna Skadina, Bolette Pedersen, Costanza Navarretta, Núria Bel, Krister Linden, Marketa Lopatkova, Jan Hajic, Gisle andersen and Przemyslaw Lenkiewicz
Publication venue: European Language Resources Association (ELRA)
Publication date: 26/05/2014
Field of study

CLARA (Common Language Resources and Their Applications) is a Marie Curie Initial Training Network which ran from 2009 until 2014 with the aim of providing researcher training in crucial areas related to language resources and infrastructure. The scope of the project was broad and included infrastructure design, lexical semantic modeling, domain modeling, multimedia and multi-modal communication, applications, and parsing technologies and grammar models. An international consortium of 9 partners and 12 associate partners employed researchers in 19 new positions and organized a training program consisting of 10 thematic courses and summer/winter schools. The project has resulted in new theoretical insights as well as new resources and tools. Most importantly, the project has trained a new generation of researchers who can perform advanced research and development in language resources and technologies.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

CLARA: A new generation of researchers in common language resources and their applications

Author: Andersen G.
Bel N.
De Smedt K.
Hajič J.
Hinrichs E.
Lenkiewicz P.
Lindén K.
Lopatková M.
Meurers D.
Navarretta C.
Sanford Pedersen B.
Skadiņa I.
Publication venue
Publication date: 01/01/2014
Field of study

MPG.PuRe

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

Author: Plank Barbara
Ramponi Alan
Sharaf Ibrahim
van der Goot Rob
Üstün Ahmet
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation.Comment: https://machamp-nlp.github.io

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

The IT University of Copenhagen's Repository

Prague Dependency Style Treebank for Tamil

Author: Ramasamy Loganathan
Žabokrtský Zdeněk
Publication venue
Publication date: 01/01/2012
Field of study

Annotated corpora such as treebanks are important for the development of parsers, language applications as well as understanding of the language itself. Only very few languages possess these scarce resources. In this paper, we describe our efforts in syntactically annotating a small corpora (600 sentences) of Tamil language. Our annotation is similar to Prague Dependency Treebank (PDT) and consists of annotation at 2 levels or layers: (i) morphological layer (m-layer) and (ii) analytical layer (a-layer). For both the layers, we introduce annotation schemes i.e. positional tagging for m-layer and dependency relations for a-layers. Finally, we discuss some of the issues in treebank development for Tamil

Biblio at Institute of Formal and Applied Linguistics