Search CORE

1,775 research outputs found

Cross-lingual transfer parsing for low-resourced languages: an Irish case study

Author: Dras Mark
Foster Jennifer
Lynn Teresa
Tounsi Lamia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We present a study of cross-lingual direct transfer parsing for the Irish language. Firstly we discuss mapping of the annotation scheme of the Irish Dependency Treebank to a universal dependency scheme. We explain our dependency label mapping choices and the structural changes required in the Irish Dependency Treebank. We then experiment with the universally annotated treebanks of ten languages from four language family groups to assess which languages are the most useful for cross-lingual parsing of Irish by using these treebanks to train delexicalised parsing models which are then applied to sentences from the Irish Dependency Treebank. The best results are achieved when using Indonesian, a language from the Austronesian language family

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Macquarie University ResearchOnline

Greek Dependency Treebank (GDT)

Author
Publication venue
Publication date: 30/07/2014
Field of study

70K words, Non-validated sentence segmentation. Non-validated POS tagging, Manual annotation of syntactic dependencies and dependency labels, Manual annotation of semantic roles, Manual annotation of events based on a shallow domain specific ontology (only for a 31K words subset of GDT

Sastavljanje Hrvatske ovisnosne banke stabala: početne etape

Author: Marko Tadić
Publication venue: Crotian Philological Society
Publication date: 01/01/2007
Field of study

The paper presents work–in–progress on the building of the Croatian Dependency Treebank. Its design principles, procedures and the pilot corpus used within are described. Perspectives for further development of the Croatian Dependency Treebank are presented at the end.Članak donosi međurezultate sastavljanja Hrvatske ovisnosne banke stabala koje je istraživanje u tijeku. Opisuju se njezina načela oblikovanja, postupci i uporabljeni pilot korpus. Na kraju se članka predstavljaju perspektive za daljnji razvitak Hrvatske ovisnosne banke stabala

HRČAK - Portal of Croatian Scientific and Professional Journals

Sastavljanje Hrvatske ovisnosne banke stabala: početne etape

Author: Marko Tadić
Publication venue: Crotian Philological Society
Publication date: 01/01/2007
Field of study

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Active learning and the Irish treebank

Author: Dras Mark
Foster Jennifer
Lynn Teresa
Uí Dhonnchadha Elaine
Publication venue
Publication date: 01/01/2012
Field of study

We report on our ongoing work in developing the Irish Dependency Treebank, describe the results of two Inter annotator Agreement (IAA) studies, demonstrate improvements in annotation consistency which have a knock-on effect on parsing accuracy, and present the final set of dependency labels. We then go on to investigate the extent to which active learning can play a role in treebank and parser development by comparing an active learning bootstrapping approach to a passive approach in which sentences are chosen at random for manual revision. We show that active learning outperforms passive learning, but when annotation effort is taken into account, it is not clear how much of an advantage the active learning approach has. Finally, we present results which suggest that adding automatic parses to the training data along with manually revised parses in an active learning setup does not greatly affect parsing accuracy

CiteSeerX

Irish Universities

DCU Online Research Access Service

Macquarie University ResearchOnline

Universal Dependencies Parsing for Colloquial Singaporean English

Author: Chan GuangYong Leonard
Chieu Hai Leong
Wang Hongmin
Yang Jie
Zhang Yue
Publication venue
Publication date: 01/01/2017
Field of study

Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntactic knowledge into a state-of-the-art parser trained on the Singlish treebank. Results show that English knowledge can lead to 25% relative error reduction, resulting in a parser of 84.47% accuracies. To the best of our knowledge, we are the first to use neural stacking to improve cross-lingual dependency parsing on low-resource languages. We make both our annotation and parser available for further research.Comment: Accepted by ACL 201

arXiv.org e-Print Archive

Crossref

The risks of mixing dependency lengths from sequences of different length

Author: Ferrer-i-Cancho Ramon
Liu Haitao
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2014
Field of study

Mixing dependency lengths from sequences of different length is a common practice in language research. However, the empirical distribution of dependency lengths of sentences of the same length differs from that of sentences of varying length and the distribution of dependency lengths depends on sentence length for real sentences and also under the null hypothesis that dependencies connect vertices located in random positions of the sequence. This suggests that certain results, such as the distribution of syntactic dependency lengths mixing dependencies from sentences of varying length, could be a mere consequence of that mixing. Furthermore, differences in the global averages of dependency length (mixing lengths from sentences of varying length) for two different languages do not simply imply a priori that one language optimizes dependency lengths better than the other because those differences could be due to differences in the distribution of sentence lengths and other factors.Comment: Laguage and referencing has been improved; Eqs. 7, 11, B7 and B8 have been correcte

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC