874 research outputs found

    Soundscapes of the Urban Past: Staged Sound as Mediated Cultural Heritage

    Get PDF
    We cannot simply listen to our urban past. Yet we encounter a rich cultural heritage of city sounds presented in text, radio and film. How can such "staged sounds" express the changing identities of cities? This volume presents a collection of studies on the staging of Amsterdam, Berlin and London soundscapes in historical documents, radio plays and films, and offers insights into themes such as film sound theory and museum audio guides. In doing so, this book puts contemporary controversies on urban sound in historical perspective, and contextualises iconic presentations of cities. It addresses academics, students, and museum workers alike

    Visual Speech Synthesis using Dynamic Visemes and Deep Learning Architectures

    Get PDF
    The aim of this work is to improve the naturalness of visual speech synthesis produced automatically from a linguistic input over existing methods. Firstly, the most important contribution is on the investigation of the most suitable speech units for the visual speech synthesis. We propose the use of dynamic visemes instead of phonemes or static visemes and found that dynamic visemes can generate better visual speech than either phone or static viseme units. Moreover, best performance is obtained by a combined phoneme-dynamic viseme system. Secondly, we examine the most appropriate model between hidden Markov model (HMM) and different deep learning models that include feedforward and recurrent structures consisting of one-to-one, many-to-one and many-to-many architectures. Results suggested that that frame-by-frame synthesis from deep learning approach outperforms state-based synthesis from HMM approaches and an encoder-decoder many-to-many architecture is better than the one-to-one and many-to-one architectures. Thirdly, we explore the importance of contextual features that include information at varying linguistic levels, from frame level up to the utterance level. Our findings found that frame level information is the most valuable feature, as it is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic animation output. Fourthly, we found that the two most common objective measures of correlation and root mean square error are not able to indicate realism and naturalness of human perceived quality. We introduce an alternative objective measure and show that the global variance is a better indicator of human perception of quality. Finally, we propose a novel method to convert a given text input and phoneme transcription into a dynamic viseme transcription in the case when a reference dynamic viseme sequence is not available. Subjective preference tests confirmed that our proposed method is able to produce animation, that are statistically indistinguishable from animation produced using reference data

    Ease of use thematic research for S&V (project contracts)

    Get PDF

    MOG 2007:Workshop on Multimodal Output Generation: CTIT Proceedings

    Get PDF
    This volume brings together presents a wide variety of work offering different perspectives on multimodal generation. Two different strands of work can be distinguished: half of the gathered papers present current work on embodied conversational agents (ECA’s), while the other half presents current work on multimedia applications. Two general research questions are shared by all: what output modalities are most suitable in which situation, and how should different output modalities be combined

    Investigating Speech Perception in Evolutionary Perspective: Comparisons of Chimpanzee (Pan troglodytes) and Human Capabilities

    Get PDF
    There has been much discussion regarding whether the capability to perceive speech is uniquely human. The “Speech is Special” (SiS) view proposes that humans possess a specialized cognitive module for speech perception (Mann & Liberman, 1983). In contrast, the “Auditory Hypothesis” (Kuhl, 1988) suggests spoken-language evolution took advantage of existing auditory-system capabilities. In support of the Auditory Hypothesis, there is evidence that Panzee, a language-trained chimpanzee (Pan troglodytes), perceives speech in synthetic “sine-wave” and “noise-vocoded” forms (Heimbauer, Beran, & Owren, 2011). Human comprehension of these altered forms of speech has been cited as evidence for specialized cognitive capabilities (Davis, Johnsrude, Hervais-Adelman, Taylor, & McGettigan, 2005). In light of Panzee’s demonstrated abilities, three experiments extended these investigations of the cognitive processes underlying her speech perception. The first experiment investigated the acoustic cues that Panzee and humans use when identifying sine-wave and noise-vocoded speech. The second experiment examined Panzee’s ability to perceive “time-reversed” speech, in which individual segments of the waveform are reversed in time. Humans are able to perceive such speech if these segments do not much exceed average phoneme length. Finally, the third experiment tested Panzee’s ability to generalize across both familiar and novel talkers, a perceptually challenging task that humans seem to perform effortlessly. Panzee’s performance was similar to that of humans in all experiments. In Experiment 1, results demonstrated that Panzee likely attends to the same “spectro-temporal” cues in sine-wave and noise-vocoded speech that humans are sensitive to. In Experiment 2, Panzee showed a similar intelligibility pattern as a function of reversal-window length as found in human listeners. In Experiment 3, Panzee readily recognized words not only from a variety of familiar adult males and females, but also from unfamiliar adults and children of both sexes. Overall, results suggest that a combination of general auditory processing and sufficient exposure to meaningful spoken language is sufficient to account for speech-perception evidence previously proposed to require specialized, uniquely human mechanisms. These findings in turn suggest that speech-perception capabilities were already present in latent form in the common evolutionary ancestors of modern chimpanzees and humans

    Presence 2005: the eighth annual international workshop on presence, 21-23 September, 2005 University College London (Conference proceedings)

    Get PDF
    OVERVIEW (taken from the CALL FOR PAPERS) Academics and practitioners with an interest in the concept of (tele)presence are invited to submit their work for presentation at PRESENCE 2005 at University College London in London, England, September 21-23, 2005. The eighth in a series of highly successful international workshops, PRESENCE 2005 will provide an open discussion forum to share ideas regarding concepts and theories, measurement techniques, technology, and applications related to presence, the psychological state or subjective perception in which a person fails to accurately and completely acknowledge the role of technology in an experience, including the sense of 'being there' experienced by users of advanced media such as virtual reality. The concept of presence in virtual environments has been around for at least 15 years, and the earlier idea of telepresence at least since Minsky's seminal paper in 1980. Recently there has been a burst of funded research activity in this area for the first time with the European FET Presence Research initiative. What do we really know about presence and its determinants? How can presence be successfully delivered with today's technology? This conference invites papers that are based on empirical results from studies of presence and related issues and/or which contribute to the technology for the delivery of presence. Papers that make substantial advances in theoretical understanding of presence are also welcome. The interest is not solely in virtual environments but in mixed reality environments. Submissions will be reviewed more rigorously than in previous conferences. High quality papers are therefore sought which make substantial contributions to the field. Approximately 20 papers will be selected for two successive special issues for the journal Presence: Teleoperators and Virtual Environments. PRESENCE 2005 takes place in London and is hosted by University College London. The conference is organized by ISPR, the International Society for Presence Research and is supported by the European Commission's FET Presence Research Initiative through the Presencia and IST OMNIPRES projects and by University College London
    corecore