1,733 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Spoken Document Retrieval in a Highly Inflectional Language

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 44-50

    On The Way To Linguistic Representation: Neuromagnetic Evidence of Early Auditory Abstraction in the Perception of Speech and Pitch

    Get PDF
    The goal of this dissertation is to show that even at the earliest (non-invasive) recordable stages of auditory cortical processing, we find evidence that cortex is calculating abstract representations from the acoustic signal. Looking across two distinct domains (inferential pitch perception and vowel normalization), I present evidence demonstrating that the M100, an automatic evoked neuromagnetic component that localizes to primary auditory cortex is sensitive to abstract computations. The M100 typically responds to physical properties of the stimulus in auditory and speech perception and integrates only over the first 25 to 40 ms of stimulus onset, providing a reliable dependent measure that allows us to tap into early stages of auditory cortical processing. In Chapter 2, I briefly present the episodicist position on speech perception and discuss research indicating that the strongest episodicist position is untenable. I then review findings from the mismatch negativity literature, where proposals have been made that the MMN allows access into linguistic representations supported by auditory cortex. Finally, I conclude the Chapter with a discussion of the previous findings on the M100/N1. In Chapter 3, I present neuromagnetic data showing that the re-sponse properties of the M100 are sensitive to the missing fundamental component using well-controlled stimuli. These findings suggest that listeners are reconstructing the inferred pitch by 100 ms after stimulus onset. In Chapter 4, I propose a novel formant ratio algorithm in which the third formant (F3) is the normalizing factor. The goal of formant ratio proposals is to provide an explicit algorithm that successfully "eliminates" speaker-dependent acoustic variation of auditory vowel tokens. Results from two MEG experiments suggest that auditory cortex is sensitive to formant ratios and that the perceptual system shows heightened sensitivity to tokens located in more densely populated regions of the vowel space. In Chapter 5, I report MEG results that suggest early auditory cortical processing is sensitive to violations of a phonological constraint on sound sequencing, suggesting that listeners make highly specific, knowledge-based predictions about rather abstract anticipated properties of the upcoming speech signal and violations of these predictions are evident in early cortical processing

    Not all geminates are created equal : evidence from Maltese glottal consonants

    Get PDF
    Many languages distinguish short and long consonants or singletons and geminates. At a phonetic level, research has established that duration is the main cue to such distinctions but that other, sometimes language-specific, cues contribute to the distinction as well. Different proposals for representing geminates share one assumption: The difference between a singleton and a geminate is relatively uniform for all consonants in a given language. In this paper, Maltese glottal consonants are shown to challenge this view. In production, secondary cues, such as the amount of voicing during closure and the spectral properties of frication noises, are stronger for glottal consonants than for oral ones, and, in perception, the role of secondary cues and duration also varies across consonants. Contrary to the assumption that gemination is a uniform process in a given language, the results show that the relative role of secondary cues and duration may differ across consonants and that gemination may involve language-specific phonetic knowledge that is specific to each consonant. These results question the idea that lexical access in speech processing can be achieved through features.peer-reviewe

    Intertextuality and ideology in interpreter-mediated communication : the case of the European Parliament

    Get PDF
    This doctoral thesis explores simultaneous interpreting (SI) as a social practice by investigating EU institutional hegemony and interpreter axiology in the institutional setting of the European Parliament (EP). Theoretical research is complemented by a corpus study of the interplay between these two forces in SI-mediated EP plenary debates. A multilayered understanding of discourse as a set of practices is developed before exploring the relationship between ideology and axiology manifest in discourse manifest in text. Bakhtin's term dialogised heteroglossia is used in this context to refer to the centripetal forces and centrifugal forces of language. The Gramscian theory of hegemony as shifting alliances is applied to EU institutional hegemony, before the concept of axiology is introduced to address subjective interpreter ethics and evaluation. Corpus analysis concentrates on intertextuality (manifest and latent intertextuality), lexical repetition of key institutional terms; and metaphor strings characteristic of EU institutional hegemony. Results suggest that EU institutional hegemony is strengthened by SI, and that interpreter mediation in the form of interpreter axiology occurs and is constrained by institutional hegemony. This `socially orientated' approach therefore contradicts the conduit view of communication. In this study, the simultaneous interpreter is shown to be an additional subjective actor in heteroglot communication

    First verbs : On the way to mini-paradigms

    Get PDF
    This 18th issue of ZAS-Papers in Linguistics consists of papers on the development of verb acquisition in 9 languages from the very early stages up to the onset of paradigm construction. Each of the 10 papers deals with first-Ianguage developmental processes in one or two children studied via longitudinal data. The languages involved are French, Spanish, Russian, Croatian, Lithuanien, Finnish, English and German. For German two different varieties are examined, one from Berlin and one from Vienna. All papers are based on presentations at the workshop 'Early verbs: On the way to mini-paradigms' held at the ZAS (Berlin) on the 30./31. of September 2000. This workshop brought to a close the first phase of cooperation between two projects on language acquisition which has started in October 1999: a) the project on "Syntaktische Konsequenzen des Morphologieerwerbs" at the ZAS (Berlin) headed by Juergen Weissenborn and Ewald Lang, and financially supported by the Deutsche Forschungsgemeinschaft, and b) the international "Crosslinguistic Project on Pre- and Protomorphology in Language Acquisition" coordinated by Wolfgang U. Dressler in behalf of the Austrian Academy of Sciences

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn

    Get PDF
    The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
    corecore