Search CORE

55,088 research outputs found

Text-based Editing of Talking-head Video

Author: Agrawala M.
Finkelstein A.
Fried O.
Genova K.
Goldman D.
Jin Z.
Shechtman E.
Tewari A.
Theobalt C.
Zollhöfer M.
Publication venue
Publication date: 01/01/2019
Field of study

Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis

MPG.PuRe

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

Author: Cheng Wei
Dai Bo
Fan Siming
Lin Dahua
Lin Kwan-Yee
Liu Shengqi
Liu Ziwei
Loy Chen Change
Luo Huiwen
Pan Dongwei
Piao Jingtan
Qian Chen
Wang Yuxin
Wu Wayne
Yang Lei
Zhuo Long
Publication venue
Publication date: 22/05/2023
Field of study

Synthesizing high-fidelity head avatars is a central problem for computer vision and graphics. While head avatar synthesis algorithms have advanced rapidly, the best ones still face great obstacles in real-world scenarios. One of the vital causes is inadequate datasets -- 1) current public datasets can only support researchers to explore high-fidelity head avatars in one or two task directions; 2) these datasets usually contain digital head assets with limited data volume, and narrow distribution over different attributes. In this paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive advance in head avatar research. It contains massive data assets, with 243+ million complete head frames, and over 800k video sequences from 500 different identities captured by synchronized multi-view cameras at 30 FPS. It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees. 2) High Diversity: The collected subjects vary from different ages, eras, ethnicities, and cultures, providing abundant materials with distinctive styles in appearance and geometry. Moreover, each subject is asked to perform various motions, such as expressions and head rotations, which further extend the richness of assets. 3) Rich Annotations: we provide annotations with different granularities: cameras' parameters, matting, scan, 2D/3D facial landmarks, FLAME fitting, and text description. Based on the dataset, we build a comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks: novel view synthesis, novel expression synthesis, hair rendering, hair editing, and talking head generation. Our experiments uncover the strengths and weaknesses of current methods. RenderMe-360 opens the door for future exploration in head avatars.Comment: Technical Report; Project Page: 36; Github Link: https://github.com/RenderMe-360/RenderMe-36

arXiv.org e-Print Archive

iPod therefore I am: Using PC Videos to Aid the Teaching of the History of Political Philosophy

Author: Duckworth Glenn
Woodcock Pete
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/02/2010
Field of study

This article outlines our experiences at the University of Huddersfield of (a) producing and using mini-lectures on the history of political philosophy that were available to students as MP4 and progressive download PC video files (and MP3 audio files), and (b) the student feedback on these files which will help future development. This article largely avoids pedagogical issues regarding the use of technology in teaching and focuses more on student feedback and use of these technologies, along with practical issues regarding the production and hosting of these teaching tools

Crossref

University of Huddersfield Repository

Huddersfield Research Portal

You said that?

Author: Chung Joon Son
Jamaludin Amir
Zisserman Andrew
Publication venue
Publication date: 01/01/2017
Field of study

We present a method for generating a video of a talking face. The method takes as inputs: (i) still images of the target face, and (ii) an audio speech segment; and outputs a video of the target face lip synched with the audio. The method runs in real time and is applicable to faces and audio not seen at training time. To achieve this we propose an encoder-decoder CNN model that uses a joint embedding of the face and audio to generate synthesised talking face video frames. The model is trained on tens of hours of unlabelled videos. We also show results of re-dubbing videos using speech from a different person.Comment: https://youtu.be/LeufDSb15Kc British Machine Vision Conference (BMVC), 201

arXiv.org e-Print Archive

Oxford University Research Archive

Creating shareable representations of practice

Author: Goodyear Peter
Steeples Christine
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1998
Field of study

This paper reports work on the use of asynchronous multimedia conferencing (AMC) to support collaborative continuing professional development. In particular it explores how we may use multimedia communications technologies to enable key elements of real‐world working knowledge, that are tacit and embedded in working practices, to be rendered into shareable forms for professional learning. We believe multimedia communications technology can offer innovative ways of capturing rich examples of working practices and tacit knowledge, and for sharing and subjecting these artefacts to scrutiny, debate and refinement within a community of learners. More explicitly, we see participants in a geographically distributed community of practice being able to create, annotate, discuss and reflect upon videoclips of their working practices within the multimedia conferencing environment. This paper summarizes some studies that cast light on how representations of practice may be captured for use in an AMC environment

CiteSeerX

Crossref

ALT Open Access Repository

Directory of Open Access Journals

Captioning Multiple Speakers using Speech Recognition to Assist Disabled People

Author: wald mike
Publication venue
Publication date: 01/07/2008
Field of study

Meetings and seminars involving many people speaking can be some of the hardest situations for deaf people to be able to follow what is being said and also for people with physical, visual or cognitive disabilities to take notes or remember key points. People may also be absent during important interactions or they may arrive late or leave early. Real time captioning using phonetic keyboards can provide an accurate live as well as archived transcription of what has been said but is often not available because of the cost and shortage of highly skilled and trained stenographers. This paper describes the development of applications that use speech recognition to provide automatic real time text transcriptions in situations when there can be many people speaking. 1 Introductio

Southampton (e-Prints Soton)