16,071 research outputs found
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
Research instrumentation for tornado electromagnetics emissions detection
Instrumentation for receiving, processing, and recording HF/VHF electromagnetic emissions from severe weather activity is described. Both airborne and ground-based instrumentation units are described on system and subsystem levels. Design considerations, design decisions, and the rationale behind the decisions are given. Performance characteristics are summarized and recommendations for improvements are given. The objectives, procedures, and test results of the following are presented: (1) airborne flight test in the Midwest U.S.A. (Spring 1975) and at the Kennedy Space Center, Florida (Summer 1975); (2) ground-based data collected in North Georgia (Summer/Fall 1975); and (3) airborne flight test in the Midwest (late Spring 1976) and at the Kennedy Space Center, Florida (Summer 1976). The Midwest tests concentrated on severe weather with tornadic activity; the Florida and Georgia tests monitored air mass convective thunderstorm characteristics. Supporting ground truth data from weather radars and sferics DF nets are described
Local Visual Microphones: Improved Sound Extraction from Silent Video
Sound waves cause small vibrations in nearby objects. A few techniques exist
in the literature that can extract sound from video. In this paper we study
local vibration patterns at different image locations. We show that different
locations in the image vibrate differently. We carefully aggregate local
vibrations and produce a sound quality that improves state-of-the-art. We show
that local vibrations could have a time delay because sound waves take time to
travel through the air. We use this phenomenon to estimate sound direction. We
also present a novel algorithm that speeds up sound extraction by two to three
orders of magnitude and reaches real-time performance in a 20KHz video.Comment: Accepted to BMVC 201
Aerospace Medicine and Biology: A continuing bibliography with indexes, supplement 217, March 1981
Approximately 130 reports, articles, and other documents introduced into the NASA scientific and technical information system in February 1981 are included in this bibliography. Topics include aerospace medicine and biology
Analysis of Disengagements in Semi-Autonomous Vehicles: Drivers’ Takeover Performance and Operational Implications
This report analyzes the reactions of human drivers placed in simulated Autonomous Technology disengagement scenarios. The study was executed in a human-in-the-loop setting, within a high-fidelity integrated car simulator capable of handling both manual and autonomous driving. A population of 40 individuals was tested, with metrics for control takeover quantification given by: i) response times (considering inputs of steering, throttle, and braking); ii) vehicle drift from the lane centerline after takeover as well as overall (integral) drift over an S-turn curve compared to a baseline obtained in manual driving; and iii) accuracy metrics to quantify human factors associated with the simulation experiment. Independent variables considered for the study were the age of the driver, the speed at the time of disengagement, and the time at which the disengagement occurred (i.e., how long automation was engaged for). The study shows that changes in the vehicle speed significantly affect all the variables investigated, pointing to the importance of setting up thresholds for maximum operational speed of vehicles driven in autonomous mode when the human driver serves as back-up. The results shows that the establishment of an operational threshold could reduce the maximum drift and lead to better control during takeover, perhaps warranting a lower speed limit than conventional vehicles. With regards to the age variable, neither the response times analysis nor the drift analysis provide support for any claim to limit the age of drivers of semi-autonomous vehicles
Bipedal steps in the development of rhythmic behavior in humans
We contrast two related hypotheses of the evolution of dance: H1: Maternal bipedal walking influenced the fetal experience of sound and associated movement patterns; H2: The human transition to bipedal gait produced more isochronous/predictable locomotion sound resulting in early music-like behavior associated with the acoustic advantages conferred by moving bipedally in pace. The cadence of walking is around 120 beats per minute, similar to the tempo of dance and music. Human walking displays long-term constancies. Dyads often subconsciously synchronize steps. The major amplitude component of the step is a distinctly produced beat. Human locomotion influences, and interacts with, emotions, and passive listening to music activates brain motor areas. Across dance-genres the footwork is most often performed in time to the musical beat. Brain development is largely shaped by early sensory experience, with hearing developed from week 18 of gestation. Newborns reacts to sounds, melodies, and rhythmic poems to which they have been exposed in utero. If the sound and vibrations produced by footfalls of a walking mother are transmitted to the fetus in coordination with the cadence of the motion, a connection between isochronous sound and rhythmical movement may be developed. Rhythmical sounds of the human mother locomotion differ substantially from that of nonhuman primates, while the maternal heartbeat heard is likely to have a similar isochronous character across primates, suggesting a relatively more influential role of footfall in the development of rhythmic/musical abilities in humans. Associations of gait, music, and dance are numerous. The apparent absence of musical and rhythmic abilities in nonhuman primates, which display little bipedal locomotion, corroborates that bipedal gait may be linked to the development of rhythmic abilities in humans. Bipedal stimuli in utero may primarily boost the ontogenetic development. The acoustical advantage hypothesis proposes a mechanism in the phylogenetic development
Agent Street: An Environment for Exploring Agent-Based Models in Second Life
Urban models can be seen on a continuum between iconic and symbolic. Generally speaking, iconic models are physical versions of the real world at some scaled down representation, while symbolic models represent the system in terms of the way they function replacing the physical or material system by some logical and/or mathematical formulae. Traditionally iconic and symbolic models were distinct classes of model but due to the rise of digital computing the distinction between the two is becoming blurred, with symbolic models being embedded into iconic models. However, such models tend to be single user. This paper demonstrates how 3D symbolic models in the form of agent-based simulations can be embedded into iconic models using the multi-user virtual world of Second Life. Furthermore, the paper demonstrates Second Life\'s potential for social science simulation. To demonstrate this, we first introduce Second Life and provide two exemplar models; Conway\'s Game of Life, and Schelling\'s Segregation Model which highlight how symbolic models can be viewed in an iconic environment. We then present a simple pedestrian evacuation model which merges the iconic and symbolic together and extends the model to directly incorporate avatars and agents in the same environment illustrating how \'real\' participants can influence simulation outcomes. Such examples demonstrate the potential for creating highly visual, immersive, interactive agent-based models for social scientists in multi-user real time virtual worlds. The paper concludes with some final comments on problems with representing models in current virtual worlds and future avenues of research.Agent-Based Modelling, Pedestrian Evacuation, Segregation, Virtual Worlds, Second Life
- …