Search CORE

321 research outputs found

Proceedings

Author: Dickinson Markus
Müürisep Kaili
Passarotti Marco
Publication venue
Publication date: 01/12/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

Recommended from our members

Semantic chunking

Author: Muszynska Ewa
Publication venue: University of Cambridge
Publication date: 01/10/2020
Field of study

Long sentences pose a challenge for natural language processing (NLP) applications. They are associated with a complex information structure leading to increased requirements for processing resources. Although the issue is present in many areas of research, there is little uniformity in the solutions used by research communities dedicated to individual NLP applications. Different aspects of the problem are addressed by different tasks, such as sentence simplification or shallow chunking. The main contribution of this thesis is the introduction of the task of semantic chunking as a general approach to reducing the cost of processing long sentences. The goal of semantic chunking is to find semantically contained fragments of a sentence representation that can be processed independently and recombined without loss of information. We anchor its principles in established concepts of semantic theory, in particular event and situation semantics. Most of the experiments in this thesis focus on semantic chunking defined on complex semantic representations in Dependency Minimal Recursion Semantics (DMRS), but we also demonstrate that the task can be performed on sentence strings. We present three chunking models: a) rule-based proof-of-concept DMRS chunking system; b) a semi-supervised sequence labelling neural model for surface semantic chunking; c) a system capable of finding semantic chunk boundaries based on the inherent structure of DMRS graphs, generalisable in the form of descriptive templates. We show how semantic chunking can be applied within a divide-and-conquer processing paradigm, using as an example the task of realization from DMRS. The application of semantic chunking yields noticeable efficiency gains without decreasing the quality of results

Apollo (Cambridge)

Towards a machine-learning architecture for lexical functional grammar parsing

Author: Chrupała Grzegorz
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2008
Field of study

Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages. The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing. In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages

Irish Universities

DCU Online Research Access Service

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

Author
Publication venue: Croatian Language Technologies Society, Faculty of Humanities and Social Science
Publication date: 01/01/2010
Field of study

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2). 29 November 2012, Lisbon, Portugal

Author
Publication venue: place:Lisbona
Publication date: 01/01/2012
Field of study

Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), held in Lisbon, Portugal on 29 November 2012

PubliCatt

Automatic grammar induction from free text using insights from cognitive grammar

Author: Muralidaran Vigneshwaran
Publication venue
Publication date
Field of study

Automatic identification of the grammatical structure of a sentence is useful in many Natural Language Processing (NLP) applications such as Document Summarisation, Question Answering systems and Machine Translation. With the availability of syntactic treebanks, supervised parsers have been developed successfully for many major languages. However, for low-resourced minority languages with fewer digital resources, this poses more of a challenge. Moreover, there are a number of syntactic annotation schemes motivated by different linguistic theories and formalisms which are sometimes language specific and they cannot always be adapted for developing syntactic parsers across different language families. This project aims to develop a linguistically motivated approach to the automatic induction of grammatical structures from raw sentences. Such an approach can be readily adapted to different languages including low-resourced minority languages. We draw the basic approach to linguistic analysis from usage-based, functional theories of grammar such as Cognitive Grammar, Computational Paninian Grammar and insights from psycholinguistic studies. Our approach identifies grammatical structure of a sentence by recognising domain-independent, general, cognitive patterns of conceptual organisation that occur in natural language. It also reflects some of the general psycholinguistic properties of parsing by humans - such as incrementality, connectedness and expectation. Our implementation has three components: Schema Definition, Schema Assembly and Schema Prediction. Schema Definition and Schema Assembly components were implemented algorithmically as a dictionary and rules. An Artificial Neural Network was trained for Schema Prediction. By using Parts of Speech tags to bootstrap the simplest case of token level schema definitions, a sentence is passed through all the three components incrementally until all the words are exhausted and the entire sentence is analysed as an instance of one final construction schema. The order in which all intermediate schemas are assembled to form the final schema can be viewed as the parse of the sentence. Parsers for English and Welsh (a low-resource minority language) were developed using the same approach with some changes to the Schema Definition component. We evaluated the parser performance by (a) Quantitative evaluation by comparing the parsed chunks against the constituents in a phrase structure tree (b) Manual evaluation by listing the range of linguistic constructions covered by the parser and by performing error analysis on the parser outputs (c) Evaluation by identifying the number of edits required for a correct assembly (d) Qualitative evaluation based on Likert scales in online surveys

Online Research @ Cardiff

Respecting Relations: Memory Access and Antecedent Retrieval in Incremental Sentence Processing

Author: Kush Dave W
Publication venue
Publication date: 01/01/2013
Field of study

This dissertation uses the processing of anaphoric relations to probe how linguistic information is encoded in and retrieved from memory during real-time sentence comprehension. More specifically, the dissertation attempts to resolve a tension between the demands of a linguistic processor implemented in a general-purpose cognitive architecture and the demands of abstract grammatical constraints that govern language use. The source of the tension is the role that abstract configurational relations (such as c-command, Reinhart 1983) play in constraining computations. Anaphoric dependencies are governed by formal grammatical constraints stated in terms of relations. For example, Binding Principle A (Chomsky 1981) requires that antecedents for local anaphors (like the English reciprocal each other) bear the c-command relation to those anaphors. In incremental sentence processing, antecedents of anaphors must be retrieved from memory. Recent research has motivated a model of processing that exploits a cue-based, associative retrieval process in content-addressable memory (e.g. Lewis, Vasishth & Van Dyke 2006) in which relations such as c-command are difficult to use as cues for retrieval. As such, the c-command constraints of formal grammars are predicted to be poorly implemented by the retrieval mechanism. I examine retrieval's sensitivity to three constraints on anaphoric dependencies: Principle A (via Hindi local reciprocal licensing), the Scope Constraint on bound-variable pronoun licensing (often stated as a c-command constraint, though see Barker 2012), and Crossover constraints on pronominal binding (Postal 1971, Wasow 1972). The data suggest that retrieval exhibits fidelity to the constraints: structurally inaccessible NPs that match an anaphoric element in morphological features do not interfere with the retrieval of an antecedent in most cases considered. In spite of this alignment, I argue that retrieval's apparent sensitivity to c-command constraints need not motivate a memory access procedure that makes direct reference to c-command relations. Instead, proxy features and general parsing operations conspire to mimic the extension of a system that respects c-command constraints. These strategies provide a robust approximation of grammatical performance while remaining within the confines of a independently- motivated general-purpose cognitive architecture

Digital Repository at the University of Maryland

Automatic Generation of Morpheme Level Reordering Rules for Korean to English Machine Translation

Author: Breanna Castellani
Publication venue: 서울대학교 대학원
Publication date: 01/02/2017
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 언어학과, 2017. 2. 신효필.Word order is one of the main challenges that machine translation systems must overcome when dealing with any linguistically divergent language pair, such as Korean and English. Statistical machine translation (SMT) models are often insufficient at long distance reordering due the distortion penalties they impose.Rule-based systems, on the other hand, are often costly, in both time and money, to build and maintain. The present research proposes a new hybrid approach for Korean to English machine translation. While previous approaches have focused on the word, our approach considers the morpheme as the basic unit of translation for this language pair. We begin by developing a classification model to disambiguate Korean functional morphemes based on alignment pairs and context feature data. Then, according to our automatically generated rules, we apply this model in a preprocessing step to reorder the morphemes to better match English sentence structure. After retraining our statistical translation system, Moses, results indicate an improvement in overall translation quality. When the SMT system's internal lexicalized reordering is restricted, we note an increase in the BLEU score of 3.5% over the SMT-only baseline. In the case where we do not limit decoding-time reordering, an even greater BLEU score increase of 4.42% is observed. We also find evidence to suggest that our changes enable Moses to execute additional reordering operations at decoding time that it was previously unable to perform.Chapter 1. Introduction 1 Chapter 2. Literature Review 6 2.1 Machine Translation. 6 2.2 Reordering 10 2.3 Korean to English MT. 12 Chapter 3. Corpus Data and SMT System. 14 3.1 Background 14 3.2 Preparation. 15 3.3 Moses 17 Chapter 4. Rule Generation. 19 4.1 Corpus Processing. 20 4.1.1 Suggested Korean-English Alignments. 21 4.1.2 Feature Sets 24 4.1.3 Reordering Movement. 26 4.2 Rule Creation. 33 4.3 Input Preprocessing. 35 4.3.1 Rule Matching. 35 4.3.2 Morpheme Reordering. 38 4.4 Examples 40 Chapter 5. Results 44 Chapter 6. Conclusion. 49 References 51 Appendix A: Rules 55 Abstract in Korean 64Maste

SNU Open Repository and Archive

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer