11 research outputs found

    Annotation Schema Oriented Validation for Dependency Parsing Evaluation

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 19-30. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Annotation Schema Oriented Validation for Dependency Parsing Evaluation

    Get PDF

    Recovering Punctuations in Instant Messages - Towards the prosody norm in IM

    Get PDF

    Corpora compilation for prosody-informed speech processing

    Get PDF
    Research on speech technologies necessitates spoken data, which is usually obtained through read recorded speech, and specifically adapted to the research needs. When the aim is to deal with the prosody involved in speech, the available data must reflect natural and conversational speech, which is usually costly and difficult to get. This paper presents a machine learning-oriented toolkit for collecting, handling, and visualization of speech data, using prosodic heuristic. We present two corpora resulting from these methodologies: PANTED corpus, containing 250 h of English speech from TED Talks, and Heroes corpus containing 8 h of parallel English and Spanish movie speech. We demonstrate their use in two deep learning-based applications: punctuation restoration and machine translation. The presented corpora are freely available to the research community

    Automated Annotation and Visualization of Rhetorical Figures

    Get PDF
    Linguistic annotation provides additional information asserted with a particular purpose in a document or other piece of information. It is widely used in various fields, from computing and bioinformatics, through imaging, to law and linguistics. There is also a clear distinction between what is communicated through the written/spoken natural language and how this is passed on. A new problem of linguistic annotation is the annotation of classical rhetorical figures --- patterns of text in which a characteristic syntactic form modifies the standard meanings of words, and leads to a change or an extension of meaning. Rhetoric studies the effectiveness of language comprehensively, including its emotional impact, as much as its propositional content. The annotation of rhetorical figures is therefore important not only for the linguistic point of view, but also for discovering different styles of writing, purpose and effect of written documents, and for better natural language understanding in general. The purpose of this thesis is the automated annotation of rhetorical figures. In the thesis we primarily focus on the figures of repetition, which include the repetition of words, phrases, and clauses. Additionally, we also describe the work we have done on the detection and annotation of figures of parallelism, as well as those that pertain more to the semantics than to the syntax, or positioning. We have developed a rhetorical figure annotation tool dubbed JANTOR (Java ANnotation Tool Of Rhetoric), which enables manual and automated annotation of files in HTML format. We have applied a lexicalized probabilistic context-free grammar parser for the recognition of the figures of repetition. We also describe a simple parse tree distance used for calculating the difference between similarly structured phrases, which is necessary for the recognition of some of the figures of parallelism. Moreover, we have applied the semantic relationships contained in the WordNet lexical database and extended Porter stemmer algorithm for finding derivationally related words. Finally, we present a method for finding pairs of words which are ordinarily contradictory, which is crucial for detecting the interesting figure of speech: oxymoron. For this purpose typed dependency grammars together with WordNet are used. The experiments we have conducted on the detection of selected subset of rhetorical figures have yielded very promising results. Lastly, we present the visualization of the occurrences of the figures and comparison between 14 American presidents' inaugural addresses including the most recent one by President Barack Obama. The provocative results of this comparison show that a) automated analysis of meaningful rhetorical information is possible and tractable, and b) help us with understanding what creates a successful orator

    An apostrophe to Scots: the invention and diffusion of the Scots apostrophe in eighteenth-century Scottish verse

    Get PDF
    The intention of this thesis is to challenge three fundamental assumptions about the function of the ‘apologetic apostrophe’ – described henceforth as the ‘Scots apostrophe’ – which have, until now, exclusively characterised the scholarly understanding of this linguistic form in Scots literary history: 1. The function of apostrophised spelling forms in Scots is to indicate elision. 2. The use of apostrophised forms undermines perceptions of Scots as a language independent from English and is solely for the benefit of accessibility for an English readership. 3. Scots is intrinsically linked with Scottishness: as an agent of anglicisation, the use of apostrophised forms therefore contributes to the erosion of Scottish cultural identity. Situated within historical pragmatics – and combining corpus and philological analysis – this study investigates the origin and diffusion of the Scots apostrophe in eighteenth-century Scottish literary verse, with particular attention paid to the influential poetic miscellanies of James Watson, Allan Ramsay, Robert Burns, and Walter Scott. First and foremost, this thesis establishes a theoretical framework with which to understand the function of the Scots apostrophe in literary Scots that simultaneously contests unscholarly myth-making with regards to linguistic practices. In broader terms, the research therein demonstrates the value of non-lexical markers, like the apostrophe, as a capacious avenue for future historical pragmatic research
    corecore