285 research outputs found

    Speech-driven facial animations improve speech-in-noise comprehension of humans

    Get PDF
    Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments

    Walter J. Ong, S.J.: A retrospective

    Get PDF
    Communication Research Trends usually charts current communication research, introducing its readers to recent developments across the range of inquiry into communication. This issue, however, takes a different tack, looking back on the writings of Walter J. Ong, S.J., who died at the age of 90 in August 2003. Ong spent his scholarly career at Saint Louis University, where he served as University Professor of Humanities, the William E. Haren Professor of English, and Professor of Humanities in Psychiatry at the Saint Louis University School of Medicine. In a career that spanned 60 years, Ong published 16 books, 245 articles, and 108 reviews. In addition, he edited a number of works and gave interviews that further explored his wide-ranging interests. Readers interested in a full bibliography of Ong’s works should refer to the web site prepared by Professor Betty Youngkin at the University of Dayton, at http://homepages.udayton.edu/~youngkin/biblio.htm. From the perspective of an interest in connections among many areas of human knowledge over such a long career, he explored a whole gamut of activities by careful observations of the threads that run through western culture and by insightful analysis of what he observed. Communication forms one of those many threads in the West—perhaps the dominant one—and so it occupies a similar place in Ong’s work. The tapestry Ong weaves has, bit by bit, influenced thinking about communication as well as research. And so, Communication Research Trends looks back on the writings of Walter Ong, S.J

    Self-organizing distributed digital library supporting audio-video

    Get PDF
    The StreamOnTheFly network combines peer-to-peer networking and open-archive principles for community radio channels and TV stations in Europe. StreamOnTheFly demonstrates new methods of archive management and personalization technologies for both audio and video. It also provides a collaboration platform for community purposes that suits the flexible activity patterns of these kinds of broadcaster communities

    Ashitaka: an audiovisual instrument

    Get PDF
    This thesis looks at how sound and visuals may be linked in a musical instrument, with a view to creating such an instrument. Though it appears to be an area of significant interest, at the time of writing there is very little existing - written, or theoretical - research available in this domain. Therefore, based on Michel Chion’s notion of synchresis in film, the concept of a fused, inseparable audiovisual material is presented. The thesis then looks at how such a material may be created and manipulated in a performance situation. A software environment named Heilan was developed in order to provide a base for experimenting with different approaches to the creation of audiovisual instruments. The software and a number of experimental instruments are discussed prior to a discussion and evaluation of the final ‘Ashitaka’ instrument. This instrument represents the culmination of the work carried out for this thesis, and is intended as a first step in identifying the issues and complications involved in the creation of such an instrument

    FSD50K: an Open Dataset of Human-Labeled Sound Events

    Full text link
    Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on a massive amount of audio tracks from YouTube videos and encompassing over 500 classes of everyday sounds. However, AudioSet is not an open dataset---its release consists of pre-computed audio features (instead of waveforms), which limits the adoption of some SER methods. Downloading the original audio tracks is also problematic due to constituent YouTube videos gradually disappearing and usage rights issues, which casts doubts over the suitability of this resource for systems' benchmarking. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51k audio clips totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Subsidia: Tools and Resources for Speech Sciences

    Get PDF
    Este libro, resultado de la colaboración de investigadores expertos en sus respectivas áreas, pretende ser una ayuda a la comunidad científica en tanto en cuanto recopila y describe una serie de materiales de gran utilidad para seguir avanzando en la investigació

    Proceedings of the 3rd Swiss conference on barrier-free communication (BfC 2020)

    Get PDF

    Beyond Media Borders, Volume 1

    Get PDF
    This open access book promotes the idea that all media types are multimodal and that comparing media types, through an intermedial lens, necessarily involves analysing these multimodal traits. The collection includes a series of interconnected articles that illustrate and clarify how the concepts developed in Elleström’s influential article The Modalities of Media: A Model for Understanding Intermedial Relations (Palgrave Macmillan, 2010) can be used for methodical investigation and interpretation of media traits and media interrelations. The authors work with a wide range of old and new media types that are traditionally investigated through limited, media-specific concepts. The publication is a significant contribution to interdisciplinary research, advancing the frontiers of conceptual as well as practical understanding of media interrelations. This is the first of two volumes. It contains Elleström’s revised article and six other contributions focusing especially on media integration: how media products and media types are combined and merged in various ways
    corecore