Search CORE

9 research outputs found

Irish treebanking and parsing: a preliminary evaluation

Author: Cetinoglu Ozlem
Dras Mark
Foster Jennifer
Lynn Teresa
Uí Dhonnchadha Elaine
van Genabith Josef
Publication venue
Publication date: 01/01/2012
Field of study

Language resources are essential for linguistic research and the development of NLP applications. Low- density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish – namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development

DCU Online Research Access Service

Macquarie University ResearchOnline

Recommended from our members

Hindi Complex Predicates: Linguistic and Computational Approaches

Author: Vaidya Ashwini
Publication venue: University of Colorado Boulder
Publication date: 01/01/2015
Field of study

Complex predicates that comprise of a noun and verb e.g. yaad kar 'memory do; remember' are a productive class of multi-words in Hindi. In this thesis, we examine the challenges of identification and representation for these complex predicates in Hindi. We design and implement their representation in a lexical semantic resource as well as in lexicalized computational grammars. As productive multi-word predicates, their accurate identification is a necessity for natural language processing applications. We use a combination of linguistic and computational approaches to address these challenges. We use these methods to demonstrate the semi-automatic creation of subcategorization frames for Hindi and the development of classes for nominal predicates. Finally, we demonstrate how linguistic features and computational tools can be used in tandem to automatically identify complex predicates from unseen text

CU Scholar Institutional Repository

Irish dependency treebanking and parsing

Author: Lynn Teresa
Publication venue: Dublin City University. ADAPT
Publication date: 01/03/2016
Field of study

Despite enjoying the status of an official EU language, Irish is considered a minority language. As with most minority languages, it is a `low-density' language, which means it lacks important linguistic and Natural Language Processing (NLP) resources. Relative to better-resourced languages such as English or French, for example, little research has been carried out on computational analysis or processing of Irish. Parsing is the method of analysing the linguistic structure of text, and it is an invaluable processing step that is required for many different types of language technology applications. As a verb-initial language, Irish has several features that are uncharacteristic of many languages previously studied in parsing research. Our work broadens the application of NLP methods to less studied language structures and provides a basis on which future work in Irish NLP is possible. We report on the development of a dependency treebank that serves as training data for the first full Irish dependency parser. We discuss the linguistic structures of Irish, and the motivation behind the design of our annotation scheme. Our work also examines various methods of employing semi-automated approaches to treebank development. We overcome the relatively small pool of linguistic and technological resources available for the Irish language with these approaches, and show that even in early stages of development, parsing results for Irish are promising. What counts as a sufficient number of trees for training a parser varies according to languages. Through empirical methods, we explore the impact our treebank's size and content has on parsing accuracy for Irish. We also discuss our work in crosslingual studies through converting our treebank to a universal annotation scheme. Finally we extend our Irish NLP work to the unstructured user-generated text of Irish tweets. We report on the creation of a POS-tagged corpus of Irish tweets and the training of statistical POS-tagging models. We show how existing resources can be leveraged for this domain-adapted resource development

Irish Universities

DCU Online Research Access Service

Macquarie University ResearchOnline

Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

Author: Alessandro Mazzei
Elena Cabrio
Fabio Tamburini
Publication venue: 'OpenEdition'
Publication date: 01/01/2018
Field of study

On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)

Identifying Signs of Syntactic Complexity for Rule-Based Sentence Simplification

Author: Evans Richard
Orasan Constantin
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 31/10/2018
Field of study

This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output

Wolverhampton Intellectual Repository and E-theses

Sentence Simplification for Text Processing

Author: Evans Richard
Publication venue: University of Wolverhampton
Publication date: 01/05/2020
Field of study

A thesis submitted in partial fulfilment of the requirement of the University of Wolverhampton for the degree of Doctor of Philosophy.Propositional density and syntactic complexity are two features of sentences which affect the ability of humans and machines to process them effectively. In this thesis, I present a new approach to automatic sentence simplification which processes sentences containing compound clauses and complex noun phrases (NPs) and converts them into sequences of simple sentences which contain fewer of these constituents and have reduced per sentence propositional density and syntactic complexity. My overall approach is iterative and relies on both machine learning and handcrafted rules. It implements a small set of sentence transformation schemes, each of which takes one sentence containing compound clauses or complex NPs and converts it one or two simplified sentences containing fewer of these constituents (Chapter 5). The iterative algorithm applies the schemes repeatedly and is able to simplify sentences which contain arbitrary numbers of compound clauses and complex NPs. The transformation schemes rely on automatic detection of these constituents, which may take a variety of forms in input sentences. In the thesis, I present two new shallow syntactic analysis methods which facilitate the detection process. The first of these identifies various explicit signs of syntactic complexity in input sentences and classifies them according to their specific syntactic linking and bounding functions. I present the annotated resources used to train and evaluate this sign tagger (Chapter 2) and the machine learning method used to implement it (Chapter 3). The second syntactic analysis method exploits the sign tagger and identifies the spans of compound clauses and complex NPs in input sentences. In Chapter 4 of the thesis, I describe the development and evaluation of a machine learning approach performing this task. This chapter also presents a new annotated dataset supporting this activity. In the thesis, I present two implementations of my approach to sentence simplification. One of these exploits handcrafted rule activation patterns to detect different parts of input sentences which are relevant to the simplification process. The other implementation uses my machine learning method to identify compound clauses and complex NPs for this purpose. Intrinsic evaluation of the two implementations is presented in Chapter 6 together with a comparison of their performance with several baseline systems. The evaluation includes comparisons of system output with human-produced simplifications, automated estimations of the readability of system output, and surveys of human opinions on the grammaticality, accessibility, and meaning of automatically produced simplifications. Chapter 7 presents extrinsic evaluation of the sentence simplification method exploiting handcrafted rule activation patterns. The extrinsic evaluation involves three NLP tasks: multidocument summarisation, semantic role labelling, and information extraction. Finally, in Chapter 8, conclusions are drawn and directions for future research considered

Wolverhampton Intellectual Repository and E-theses

An investigation into deviant morphology : issues in the implementation of a deep grammar for Indonesian

Author: Mistica Meladel
Publication venue
Publication date: 10/01/2019
Field of study

This thesis investigates deviant morphology in Indonesian for the implementation of a deep grammar. In particular we focus on the implementation of the verbal suffix -kan. This suffix has been described as having many functions, which alter the kinds of arguments and the number of arguments the verb takes (Dardjowidjojo 1971; Chung 1976; Arka 1993; Vamarasi 1999; Kroeger 2007; Son and Cole 2008). Deep grammars or precision grammars (Butt et al. 1999a; Butt et al. 2003; Bender et al. 2011) have been shown to be useful for natural language processing (NLP) tasks, such as machine translation and generation (Oepen et al. 2004; Cahill and Riester 2009; Graham 2011), and information extraction (MacKinlay et al. 2012), demonstrating the need for linguistically rich information to aid NLP tasks. Although these linguistically-motivated grammars are invaluable resources to the NLP community, the biggest drawback is the time required for the manual creation and curation of the lexicon. Our work aims to expedite this process by applying methods to assign syntactic information to kan-affixed verbs automatically. The method we employ exploits the hypothesis that semantic similarity is tightly connected with syntactic behaviour (Levin 1993). Our endeavour in automatically acquiring verbal information for an Indonesian deep grammar poses a number of lingustic challenges. First of all Indonesian verbs exhibit voice marking that is characteristic of the subgrouping of its language family. In order to be able to characterise verbal behaviour in Indonesian, we first need to devise a detailed analysis of voice for implementation. Another challenge we face is the claim that all open class words in Indonesian, at least as it is spoken in some varieties (Gil 1994; Gil 2010), cannot linguistically be analysed as being distinct from each other. That is, there is no distiction between nouns, verbs or adjectives in Indonesian, and all word from the open class categories should be analysed uniformly. This poses difficulties in implementing a grammar in a linguistically motivated way, as well discovering syntactic behaviour of verbs, if verbs cannot be distinguished from nouns. As part of our investigation we conduct experiments to verify the need to employ word class categories, and we find that indeed these are linguistically motivated labels in Indonesian. Through our investigation into deviant morphological behaviour, we gain a better characterisation of the morphosyntactic effects of -kan, and we discover that, although Indonesian has been labelled as a language with no open word class distinctions, word classes can be established as being linguistically-motivated

The Australian National University

Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

Author: Abramova Ekaterina
Adorni Giovanni
Agrawal Ruchit
Aina Laura
Albanese Teresa
Albanesi Davide
Alzetta Chiara
Amore Matteo
Antonelli Oronzo
Aprosio Alessio Palmero
Balaraman Vevake
Basile Pierpaolo
Basile Valerio
Basili Roberto
Bassignana Elisa
Bellandi Andrea
Bentivogli Luisa
Bernardi Raffaella
Bertoldi Nicola
Bondielli Alessandro
Bos Johan
Bosco Cristina
Bottini Roberto
Brunato Dominique
Brunato⋄ Dominique
Buono Maria Pia di
Busso Lucia
Büchler Marco
Cabrio Elena
Caruso Valeria
Caselli Tommaso
Cecchini Flavio
Celli Fabio
Cervone Alessandra
Chesi Cristiano
Chingacham Anupama
Chiriatti Giulia
Cimino Andrea
Cocciu• Eleonora
Colla Davide
Comandini Gloria
Cordeiro Silvio Ricardo
Crepaldi Davide
Croce Danilo
Curtoni Paolo
Cutugno Francesco
dell’Oglio Pietro
Dell’Orletta Felice
Dell’Orletta⋄ Felice
De Felice Irene
De Martino Maria
Dini Luca
Di Iorio Angelo
Di Nunzio Giorgio Maria
Draetta Lia
Ducceschi Luca
Elia Annibale
Falavigna Daniele
Federico Marcello
Feltracco Anna
Fernández Raquel
Ferro Michele
Fieromonte Martina
Franzini Greta
Gagliardi Gloria
Gala Valentina Della
Gambi Enrico
Ghezzi Ilaria
Giovannetti Emiliano
Gobbi Jacopo
Gretter Roberto
Guarasci Raffaele
Guerini Marco
Gurevych Iryna
Günther Fritz
Herzog Leonardo
Jezek Elisabetta
Koceva Forsina
Lai Mirko
Laudanna Alessandro
Lenci Alessandro
Lepri Bruno
Liano Annarita
Limpens Freddy
Louvan Samuel
Lyding Verena
Magnini Bernardo
Magnolini Simone
Mairano Paolo
Mambrini Francesco
Mana Dario
Mancuso Azzurra
Marchi Simone
Marelli Marco
Marini Costanza
Mazzei Alessandro
McGregor Stephen
Melnikova Elena
Menini Stefano
Mensa Enrico
Merenda Flavio
Mollo Eleonora
Montemagni Simonetta
Montemagni⋄ Simonetta
Monti Johanna
Moretti Giovanni
Moritz Maria
Nadalini Andrea
Negri Matteo
Nicolas Lionel
Nissim Malvina
Novielli Nicole
Okinina Nadezda
Pannitto Ludovica
Paperno Denis
Passalacqua Samuele
Passaro Lucia C.
Passarotti Marco
Patti Viviana
Pecchioli Alessandra
Pellegrini Matteo
Petrolito Ruggero
Pettenati Maria Chiara
Piantanida Giovanni
Poggi Isabella
Porporato Aureliano
Quinci Vito
Radicioni Daniele P.
Ramisch Carlos
Rapp Amon
Riccardi Giuseppe
Rossini Daniele
Rotondi Agata
Ruffolo Paolo
Russo Irene
Sagri Maria Teresa
Sangati Federico
Sanguinetti Manuela
Savary Agata
Savy Renata
Simeoni Rossana
Simi Maria
Sorgente Antonio
Speranza Manuela
Sprugnoli Rachele
Stede Manfred
Stepanov Evgeny A.
Stingo Michele
Tamburini Fabio
Tebbifakhr Amirhossein
Tonelli Sara
Torre Ilaria
Tortoreto Giuliano
Totis Pietro
Trotta Daniela
Turchi Marco
Valeriani Martina
Venturi Giulia
Venturi⋄ Giulia
Vezzani Federica
Villata Serena
Vincze Veronika
Zaghi Claudia
Zovato Enrico
Publication venue: 'OpenEdition'
Publication date: 08/04/2019
Field of study

OpenEdition

Grundlagen der Informationswissenschaft

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library