21,828 research outputs found
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
Recommended from our members
Mobilizing The Open University: case studies in strategic mobile development
This paper presents an overview of many activities undertaken in the Mobile Learner Support project area in The Open University (OU). Please note that while many of the project strands involve strategic development that is embedded in the OUâs institution-wide teaching and learning systems, some of the data and findings we hope will be of use to others undertaking work in related areas. In addition to the core work in implementing a Mobile VLE and associated resources, an overview of related mobile audio eAssessment and eBook format development project strands are given, leading to development of a blend of web application software and native or client applications.
The OU delivers significant proportions of online content and collaboration as part of its supported open learning distance education model to over 200,000 part-time students at any given time. In particular, over the past 4 years, adapting open source technologies for around 600 course websites has delivered the requirement to support course activities for up to 4,700 students per course cohort with a corresponding 250 variations of a single course to provide online tutorial spaces. The OU has also throughout its history adapted to increasingly flexible and personalised modes of delivering and interacting with multimedia and audiovisual content as part of a blended approach, most recently aiming to disaggregate content and allow remixing through its open educational resources initiative.
For updates on the Mobile Learner Support project, please visit http://www.open.ac.uk/blogs/mLear
Time-delay neural network for continuous emotional dimension prediction from facial expression sequences
"(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works."Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a Time-Delay Neural Network (TDNN) is proposed to model the temporal relationships between
consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic
facial expressions. The proposed approach has won the affect recognition sub-challenge of the third international Audio/Visual Emotion Recognition Challenge (AVEC2013)1
Leveraging video annotations in video-based e-learning
The e-learning community has been producing and using video content for a
long time, and in the last years, the advent of MOOCs greatly relied on video
recordings of teacher courses. Video annotations are information pieces that
can be anchored in the temporality of the video so as to sustain various
processes ranging from active reading to rich media editing. In this position
paper we study how video annotations can be used in an e-learning context -
especially MOOCs - from the triple point of view of pedagogical processes,
current technical platforms functionalities, and current challenges. Our
analysis is that there is still plenty of room for leveraging video annotations
in MOOCs beyond simple active reading, namely live annotation, performance
annotation and annotation for assignment; and that new developments are needed
to accompany this evolution.Comment: 7th International Conference on Computer Supported Education (CSEDU),
Barcelone : Spain (2014
'Breaking the glass': preserving social history in virtual environments
New media technologies play an important role in the evolution of our society. Traditional museums and heritage sites have evolved from the âcabinets of curiosityâ that focused mainly on the authority of the voice organising content, to the places that offer interactivity as a means to experience historical and cultural events of the past. They attempt to break down the division between visitors and historical artefacts, employing modern technologies that allow the audience to perceive a range of perspectives of the historical event. In this paper, we discuss virtual reconstruction and interactive storytelling techniques as a research methodology and educational and presentation practices for cultural heritage sites. We present the Narrating the Past project as a case study, in order to illustrate recent changes in the preservation of social history and guided tourist trails that aim to make the visitorâs experience more than just an architectural walk through
- âŚ