24,564 research outputs found
Joint Attention for Automated Video Editing
International audienceJoint attention refers to the shared focal points of attention for occupants in a space. In this work, we introduce a computational definition of joint attention for the automated editing of meetings in multi-camera environments from the AMI corpus. Using extracted head pose and individual headset amplitude as features, we developed three editing methods: (1) a naive audio-based method that selects the camera using only the headset input, (2) a rule-based edit that selects cameras at a fixed pacing using pose data, and (3) an editing algorithm using LSTM (Long-short term memory) learned joint-attention from both pose and audio data, trained on expert edits. The methods are evaluated qualitatively against the human edit, and quantitatively in a user study with 22 participants. Results indicate that LSTM-trained joint attention produces edits that are comparable to the expert edit, offering a wider range of camera views than audio, while being more generalizable as compared to rule-based methods
Interactive context-aware user-driven metadata correction in digital libraries
Personal name variants are a common problem in digital libraries, reducing the precision of searches and complicating browsing-based interaction. The book-centric approach of name authority control has not scaled to match the growth and diversity of digital repositories. In this paper, we present a novel system for user-driven integration of name variants when interacting with web-based information-in particular digital library-systems. We approach these issues via a client-side JavaScript browser extension that can reorganize web content and also integrate remote data sources. Designed to be agnostic towards the web sites it is applied to, we illustrate the developed proof-of-concept system through worked examples using three different digital libraries. We discuss the extensibility of the approach in the context of other user-driven information systems and the growth of the Semantic Web
Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning
Cinemagraphs are a compelling way to convey dynamic aspects of a scene. In
these media, dynamic and still elements are juxtaposed to create an artistic
and narrative experience. Creating a high-quality, aesthetically pleasing
cinemagraph requires isolating objects in a semantically meaningful way and
then selecting good start times and looping periods for those objects to
minimize visual artifacts (such a tearing). To achieve this, we present a new
technique that uses object recognition and semantic segmentation as part of an
optimization method to automatically create cinemagraphs from videos that are
both visually appealing and semantically meaningful. Given a scene with
multiple objects, there are many cinemagraphs one could create. Our method
evaluates these multiple candidates and presents the best one, as determined by
a model trained to predict human preferences in a collaborative way. We
demonstrate the effectiveness of our approach with multiple results and a user
study.Comment: To appear in ICCV 2017. Total 17 pages including the supplementary
materia
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Collaboration in the Semantic Grid: a Basis for e-Learning
The CoAKTinG project aims to advance the state of the art in collaborative mediated spaces for the Semantic Grid. This paper presents an overview of the hypertext and knowledge based tools which have been deployed to augment existing collaborative environments, and the ontology which is used to exchange structure, promote enhanced process tracking, and aid navigation of resources before, after, and while a collaboration occurs. While the primary focus of the project has been supporting e-Science, this paper also explores the similarities and application of CoAKTinG technologies as part of a human-centred design approach to e-Learning
You said that?
We present a method for generating a video of a talking face. The method
takes as inputs: (i) still images of the target face, and (ii) an audio speech
segment; and outputs a video of the target face lip synched with the audio. The
method runs in real time and is applicable to faces and audio not seen at
training time.
To achieve this we propose an encoder-decoder CNN model that uses a joint
embedding of the face and audio to generate synthesised talking face video
frames. The model is trained on tens of hours of unlabelled videos.
We also show results of re-dubbing videos using speech from a different
person.Comment: https://youtu.be/LeufDSb15Kc British Machine Vision Conference
(BMVC), 201
- …