Search CORE

874 research outputs found

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Sound : eine Arbeitsbibliographie

Author: Wulff Hans Jürgen
Publication venue
Publication date: 01/02/2010
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Soundscapes of the Urban Past: Staged Sound as Mediated Cultural Heritage

Author
Publication venue: Bielefeld
Publication date: 01/01/2013
Field of study

We cannot simply listen to our urban past. Yet we encounter a rich cultural heritage of city sounds presented in text, radio and film. How can such "staged sounds" express the changing identities of cities? This volume presents a collection of studies on the staging of Amsterdam, Berlin and London soundscapes in historical documents, radio plays and films, and offers insights into themes such as film sound theory and museum audio guides. In doing so, this book puts contemporary controversies on urban sound in historical perspective, and contextualises iconic presentations of cities. It addresses academics, students, and museum workers alike

SSOAR - Social Science Open Access Repository

Visual Speech Synthesis using Dynamic Visemes and Deep Learning Architectures

Author: Thangthai Ausdang
Publication venue
Publication date: 01/04/2018
Field of study

The aim of this work is to improve the naturalness of visual speech synthesis produced automatically from a linguistic input over existing methods. Firstly, the most important contribution is on the investigation of the most suitable speech units for the visual speech synthesis. We propose the use of dynamic visemes instead of phonemes or static visemes and found that dynamic visemes can generate better visual speech than either phone or static viseme units. Moreover, best performance is obtained by a combined phoneme-dynamic viseme system. Secondly, we examine the most appropriate model between hidden Markov model (HMM) and different deep learning models that include feedforward and recurrent structures consisting of one-to-one, many-to-one and many-to-many architectures. Results suggested that that frame-by-frame synthesis from deep learning approach outperforms state-based synthesis from HMM approaches and an encoder-decoder many-to-many architecture is better than the one-to-one and many-to-one architectures. Thirdly, we explore the importance of contextual features that include information at varying linguistic levels, from frame level up to the utterance level. Our findings found that frame level information is the most valuable feature, as it is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic animation output. Fourthly, we found that the two most common objective measures of correlation and root mean square error are not able to indicate realism and naturalness of human perceived quality. We introduce an alternative objective measure and show that the global variance is a better indicator of human perception of quality. Finally, we propose a novel method to convert a given text input and phoneme transcription into a dynamic viseme transcription in the case when a reference dynamic viseme sequence is not available. Subjective preference tests confirmed that our proposed method is able to produce animation, that are statistically indistinguishable from animation produced using reference data

University of East Anglia digital repository

Ease of use thematic research for S&V (project contracts)

Author: Eggen J.H.
Publication venue: Instituut voor Perceptie Onderzoek (IPO)
Publication date: 30/05/1996
Field of study

Pure OAI Repository

MOG 2007:Workshop on Multimodal Output Generation: CTIT Proceedings

Author: Krahmer Emiel
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 25/01/2007
Field of study

This volume brings together presents a wide variety of work offering different perspectives on multimodal generation. Two different strands of work can be distinguished: half of the gathered papers present current work on embodied conversational agents (ECA’s), while the other half presents current work on multimedia applications. Two general research questions are shared by all: what output modalities are most suitable in which situation, and how should different output modalities be combined

University of Twente Research Information

Investigating Speech Perception in Evolutionary Perspective: Comparisons of Chimpanzee (Pan troglodytes) and Human Capabilities

Author: Heimbauer Lisa A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/08/2012
Field of study

There has been much discussion regarding whether the capability to perceive speech is uniquely human. The “Speech is Special” (SiS) view proposes that humans possess a specialized cognitive module for speech perception (Mann & Liberman, 1983). In contrast, the “Auditory Hypothesis” (Kuhl, 1988) suggests spoken-language evolution took advantage of existing auditory-system capabilities. In support of the Auditory Hypothesis, there is evidence that Panzee, a language-trained chimpanzee (Pan troglodytes), perceives speech in synthetic “sine-wave” and “noise-vocoded” forms (Heimbauer, Beran, & Owren, 2011). Human comprehension of these altered forms of speech has been cited as evidence for specialized cognitive capabilities (Davis, Johnsrude, Hervais-Adelman, Taylor, & McGettigan, 2005). In light of Panzee’s demonstrated abilities, three experiments extended these investigations of the cognitive processes underlying her speech perception. The first experiment investigated the acoustic cues that Panzee and humans use when identifying sine-wave and noise-vocoded speech. The second experiment examined Panzee’s ability to perceive “time-reversed” speech, in which individual segments of the waveform are reversed in time. Humans are able to perceive such speech if these segments do not much exceed average phoneme length. Finally, the third experiment tested Panzee’s ability to generalize across both familiar and novel talkers, a perceptually challenging task that humans seem to perform effortlessly. Panzee’s performance was similar to that of humans in all experiments. In Experiment 1, results demonstrated that Panzee likely attends to the same “spectro-temporal” cues in sine-wave and noise-vocoded speech that humans are sensitive to. In Experiment 2, Panzee showed a similar intelligibility pattern as a function of reversal-window length as found in human listeners. In Experiment 3, Panzee readily recognized words not only from a variety of familiar adult males and females, but also from unfamiliar adults and children of both sexes. Overall, results suggest that a combination of general auditory processing and sufficient exposure to meaningful spoken language is sufficient to account for speech-perception evidence previously proposed to require specialized, uniquely human mechanisms. These findings in turn suggest that speech-perception capabilities were already present in latent form in the common evolutionary ancestors of modern chimpanzees and humans

ScholarWorks @ Georgia State University

Recommended from our members

Perceptual learning of context-sensitive phonetic detail

Author: Barden Katharine
Publication venue: University of Cambridge
Publication date: 12/07/2011
Field of study

[Abstract abbreviated due to inability of DSpace@Cambridge to display phonetic symbols. Please see the full abstract in the attached pdf file.] Although familiarity with a talker or accent is known to facilitate perception, it is not clear what underlies this phenomenon. Previous research has focused primarily on whether listeners can learn to associate novel phonetic characteristics with low-level units such as features or phonemes. However, this neglects the potential role of phonetic information at many other levels of representation. To address this shortcoming, this thesis investigated perceptual learning of systematic phonetic detail relating to higher levels of linguistic structure, including prosodic, grammatical and morphological contexts. Furthermore, in contrast to many previous studies, this research used relatively natural stimuli and tasks, thus maximising its relevance to perceptual learning in ordinary listening situations. This research shows that listeners can update their phonetic representations in response to incoming information and its relation to linguistic-structural context. In addition, certain patterns of systematic phonetic detail were more learnable than others. These findings are used to inform an account of how new information is integrated with prior experience in speech processing, within a framework that emphasises the importance of phonetic detail at multiple levels of representation.This work was funded by an AHRC grant

Apollo (Cambridge)

Presence 2005: the eighth annual international workshop on presence, 21-23 September, 2005 University College London (Conference proceedings)

Author
Publication venue: Department of Computer Science, UCL (University College London)
Publication date: 21/09/2005
Field of study

OVERVIEW (taken from the CALL FOR PAPERS) Academics and practitioners with an interest in the concept of (tele)presence are invited to submit their work for presentation at PRESENCE 2005 at University College London in London, England, September 21-23, 2005. The eighth in a series of highly successful international workshops, PRESENCE 2005 will provide an open discussion forum to share ideas regarding concepts and theories, measurement techniques, technology, and applications related to presence, the psychological state or subjective perception in which a person fails to accurately and completely acknowledge the role of technology in an experience, including the sense of 'being there' experienced by users of advanced media such as virtual reality. The concept of presence in virtual environments has been around for at least 15 years, and the earlier idea of telepresence at least since Minsky's seminal paper in 1980. Recently there has been a burst of funded research activity in this area for the first time with the European FET Presence Research initiative. What do we really know about presence and its determinants? How can presence be successfully delivered with today's technology? This conference invites papers that are based on empirical results from studies of presence and related issues and/or which contribute to the technology for the delivery of presence. Papers that make substantial advances in theoretical understanding of presence are also welcome. The interest is not solely in virtual environments but in mixed reality environments. Submissions will be reviewed more rigorously than in previous conferences. High quality papers are therefore sought which make substantial contributions to the field. Approximately 20 papers will be selected for two successive special issues for the journal Presence: Teleoperators and Virtual Environments. PRESENCE 2005 takes place in London and is hosted by University College London. The conference is organized by ISPR, the International Society for Presence Research and is supported by the European Commission's FET Presence Research Initiative through the Presencia and IST OMNIPRES projects and by University College London

UCL Discovery