Search CORE

21,828 research outputs found

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Author: Busso Carlos
Tao Fei
Publication venue
Publication date: 12/09/2018
Field of study

Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

arXiv.org e-Print Archive

Access to recorded interviews: A research agenda

Author: Heeren W.F.L.
Jong F.M.G. de
Oard D.W.
Ordelman R.J.F.
Publication venue: ACM
Publication date: 01/01/2008
Field of study

Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

University of Twente Research Information

Recommended from our members

Mobilizing The Open University: case studies in strategic mobile development

Author: Thomas Rhodri Curwen
Publication venue
Publication date: 01/04/2010
Field of study

This paper presents an overview of many activities undertaken in the Mobile Learner Support project area in The Open University (OU). Please note that while many of the project strands involve strategic development that is embedded in the OU’s institution-wide teaching and learning systems, some of the data and findings we hope will be of use to others undertaking work in related areas. In addition to the core work in implementing a Mobile VLE and associated resources, an overview of related mobile audio eAssessment and eBook format development project strands are given, leading to development of a blend of web application software and native or client applications. The OU delivers significant proportions of online content and collaboration as part of its supported open learning distance education model to over 200,000 part-time students at any given time. In particular, over the past 4 years, adapting open source technologies for around 600 course websites has delivered the requirement to support course activities for up to 4,700 students per course cohort with a corresponding 250 variations of a single course to provide online tutorial spaces. The OU has also throughout its history adapted to increasingly flexible and personalised modes of delivering and interacting with multimedia and audiovisual content as part of a blended approach, most recently aiming to disaggregate content and allow remixing through its open educational resources initiative. For updates on the Mobile Learner Support project, please visit http://www.open.ac.uk/blogs/mLear

Open Research Online (The Open University)

Time-delay neural network for continuous emotional dimension prediction from facial expression sequences

Author: Hongying Meng
Jinkuang Cheng
John Cosmas
Nadia Bianchi-berthouze
Senior Member
Yangdong Deng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2016
Field of study

"(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works."Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a Time-Delay Neural Network (TDNN) is proposed to model the temporal relationships between consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic facial expressions. The proposed approach has won the affect recognition sub-challenge of the third international Audio/Visual Emotion Recognition Challenge (AVEC2013)1

CiteSeerX

Crossref

UCL Discovery

Brunel University Research Archive

Leveraging video annotations in video-based e-learning

Author: Aubert Olivier
Canellas Camila
Prié Yannick
Publication venue
Publication date: 01/04/2014
Field of study

The e-learning community has been producing and using video content for a long time, and in the last years, the advent of MOOCs greatly relied on video recordings of teacher courses. Video annotations are information pieces that can be anchored in the temporality of the video so as to sustain various processes ranging from active reading to rich media editing. In this position paper we study how video annotations can be used in an e-learning context - especially MOOCs - from the triple point of view of pedagogical processes, current technical platforms functionalities, and current challenges. Our analysis is that there is still plenty of room for leveraging video annotations in MOOCs beyond simple active reading, namely live annotation, performance annotation and annotation for assignment; and that new developments are needed to accompany this evolution.Comment: 7th International Conference on Computer Supported Education (CSEDU), Barcelone : Spain (2014

arXiv.org e-Print Archive

CiteSeerX

'Breaking the glass': preserving social history in virtual environments

Author: Kuksa I
Tuck D
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2011
Field of study

New media technologies play an important role in the evolution of our society. Traditional museums and heritage sites have evolved from the ‘cabinets of curiosity’ that focused mainly on the authority of the voice organising content, to the places that offer interactivity as a means to experience historical and cultural events of the past. They attempt to break down the division between visitors and historical artefacts, employing modern technologies that allow the audience to perceive a range of perspectives of the historical event. In this paper, we discuss virtual reconstruction and interactive storytelling techniques as a research methodology and educational and presentation practices for cultural heritage sites. We present the Narrating the Past project as a case study, in order to illustrate recent changes in the preservation of social history and guided tourist trails that aim to make the visitor’s experience more than just an architectural walk through

Crossref

Nottingham Trent Institutional Repository (IRep)