Search CORE

11 research outputs found

Annotation Schema Oriented Validation for Dependency Parsing Evaluation

Author: Bosco Cristina
Lavelli Alberto
Publication venue
Publication date: 29/11/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 19-30. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library

Annotation Schema Oriented Validation for Dependency Parsing Evaluation

Author: Alberto Lavelli
Bosco Cristina
Publication venue: NEALT - Northern European Association for Language Technology
Publication date: 01/01/2010
Field of study

Institutional Research Information System University of Turin

Recovering Punctuations in Instant Messages - Towards the prosody norm in IM

Author: Zhou Lina
Zhuang Ziming
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2004
Field of study

AIS Electronic Library (AISeL)

Corpora compilation for prosody-informed speech processing

Author: Bonafonte Antonio
Farrús Mireia
Öktem Alp
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Research on speech technologies necessitates spoken data, which is usually obtained through read recorded speech, and specifically adapted to the research needs. When the aim is to deal with the prosody involved in speech, the available data must reflect natural and conversational speech, which is usually costly and difficult to get. This paper presents a machine learning-oriented toolkit for collecting, handling, and visualization of speech data, using prosodic heuristic. We present two corpora resulting from these methodologies: PANTED corpus, containing 250 h of English speech from TED Talks, and Heroes corpus containing 8 h of parallel English and Spanish movie speech. We demonstrate their use in two deep learning-based applications: punctuation restoration and machine translation. The presented corpora are freely available to the research community

UPF Digital Repository

Diposit Digital de la Universitat de Barcelona

Automated Annotation and Visualization of Rhetorical Figures

Author: Gawryjolek Jakub Jan
Publication venue: 'University of Waterloo'
Publication date: 11/05/2009
Field of study

Linguistic annotation provides additional information asserted with a particular purpose in a document or other piece of information. It is widely used in various fields, from computing and bioinformatics, through imaging, to law and linguistics. There is also a clear distinction between what is communicated through the written/spoken natural language and how this is passed on. A new problem of linguistic annotation is the annotation of classical rhetorical figures --- patterns of text in which a characteristic syntactic form modifies the standard meanings of words, and leads to a change or an extension of meaning. Rhetoric studies the effectiveness of language comprehensively, including its emotional impact, as much as its propositional content. The annotation of rhetorical figures is therefore important not only for the linguistic point of view, but also for discovering different styles of writing, purpose and effect of written documents, and for better natural language understanding in general. The purpose of this thesis is the automated annotation of rhetorical figures. In the thesis we primarily focus on the figures of repetition, which include the repetition of words, phrases, and clauses. Additionally, we also describe the work we have done on the detection and annotation of figures of parallelism, as well as those that pertain more to the semantics than to the syntax, or positioning. We have developed a rhetorical figure annotation tool dubbed JANTOR (Java ANnotation Tool Of Rhetoric), which enables manual and automated annotation of files in HTML format. We have applied a lexicalized probabilistic context-free grammar parser for the recognition of the figures of repetition. We also describe a simple parse tree distance used for calculating the difference between similarly structured phrases, which is necessary for the recognition of some of the figures of parallelism. Moreover, we have applied the semantic relationships contained in the WordNet lexical database and extended Porter stemmer algorithm for finding derivationally related words. Finally, we present a method for finding pairs of words which are ordinarily contradictory, which is crucial for detecting the interesting figure of speech: oxymoron. For this purpose typed dependency grammars together with WordNet are used. The experiments we have conducted on the detection of selected subset of rhetorical figures have yielded very promising results. Lastly, we present the visualization of the occurrences of the figures and comparison between 14 American presidents' inaugural addresses including the most recent one by President Barack Obama. The provocative results of this comparison show that a) automated analysis of meaningful rhetorical information is possible and tractable, and b) help us with understanding what creates a successful orator

University of Waterloo's Institutional Repository

An apostrophe to Scots: the invention and diffusion of the Scots apostrophe in eighteenth-century Scottish verse

Author: Selfe David William
Publication venue
Publication date: 01/01/2021
Field of study

The intention of this thesis is to challenge three fundamental assumptions about the function of the ‘apologetic apostrophe’ – described henceforth as the ‘Scots apostrophe’ – which have, until now, exclusively characterised the scholarly understanding of this linguistic form in Scots literary history: 1. The function of apostrophised spelling forms in Scots is to indicate elision. 2. The use of apostrophised forms undermines perceptions of Scots as a language independent from English and is solely for the benefit of accessibility for an English readership. 3. Scots is intrinsically linked with Scottishness: as an agent of anglicisation, the use of apostrophised forms therefore contributes to the erosion of Scottish cultural identity. Situated within historical pragmatics – and combining corpus and philological analysis – this study investigates the origin and diffusion of the Scots apostrophe in eighteenth-century Scottish literary verse, with particular attention paid to the influential poetic miscellanies of James Watson, Allan Ramsay, Robert Burns, and Walter Scott. First and foremost, this thesis establishes a theoretical framework with which to understand the function of the Scots apostrophe in literary Scots that simultaneously contests unscholarly myth-making with regards to linguistic practices. In broader terms, the research therein demonstrates the value of non-lexical markers, like the apostrophe, as a capacious avenue for future historical pragmatic research

Glasgow Theses Service