16,071 research outputs found

    End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

    Full text link
    Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

    Research instrumentation for tornado electromagnetics emissions detection

    Get PDF
    Instrumentation for receiving, processing, and recording HF/VHF electromagnetic emissions from severe weather activity is described. Both airborne and ground-based instrumentation units are described on system and subsystem levels. Design considerations, design decisions, and the rationale behind the decisions are given. Performance characteristics are summarized and recommendations for improvements are given. The objectives, procedures, and test results of the following are presented: (1) airborne flight test in the Midwest U.S.A. (Spring 1975) and at the Kennedy Space Center, Florida (Summer 1975); (2) ground-based data collected in North Georgia (Summer/Fall 1975); and (3) airborne flight test in the Midwest (late Spring 1976) and at the Kennedy Space Center, Florida (Summer 1976). The Midwest tests concentrated on severe weather with tornadic activity; the Florida and Georgia tests monitored air mass convective thunderstorm characteristics. Supporting ground truth data from weather radars and sferics DF nets are described

    Local Visual Microphones: Improved Sound Extraction from Silent Video

    Full text link
    Sound waves cause small vibrations in nearby objects. A few techniques exist in the literature that can extract sound from video. In this paper we study local vibration patterns at different image locations. We show that different locations in the image vibrate differently. We carefully aggregate local vibrations and produce a sound quality that improves state-of-the-art. We show that local vibrations could have a time delay because sound waves take time to travel through the air. We use this phenomenon to estimate sound direction. We also present a novel algorithm that speeds up sound extraction by two to three orders of magnitude and reaches real-time performance in a 20KHz video.Comment: Accepted to BMVC 201

    Aerospace Medicine and Biology: A continuing bibliography with indexes, supplement 217, March 1981

    Get PDF
    Approximately 130 reports, articles, and other documents introduced into the NASA scientific and technical information system in February 1981 are included in this bibliography. Topics include aerospace medicine and biology

    Analysis of Disengagements in Semi-Autonomous Vehicles: Drivers’ Takeover Performance and Operational Implications

    Get PDF
    This report analyzes the reactions of human drivers placed in simulated Autonomous Technology disengagement scenarios. The study was executed in a human-in-the-loop setting, within a high-fidelity integrated car simulator capable of handling both manual and autonomous driving. A population of 40 individuals was tested, with metrics for control takeover quantification given by: i) response times (considering inputs of steering, throttle, and braking); ii) vehicle drift from the lane centerline after takeover as well as overall (integral) drift over an S-turn curve compared to a baseline obtained in manual driving; and iii) accuracy metrics to quantify human factors associated with the simulation experiment. Independent variables considered for the study were the age of the driver, the speed at the time of disengagement, and the time at which the disengagement occurred (i.e., how long automation was engaged for). The study shows that changes in the vehicle speed significantly affect all the variables investigated, pointing to the importance of setting up thresholds for maximum operational speed of vehicles driven in autonomous mode when the human driver serves as back-up. The results shows that the establishment of an operational threshold could reduce the maximum drift and lead to better control during takeover, perhaps warranting a lower speed limit than conventional vehicles. With regards to the age variable, neither the response times analysis nor the drift analysis provide support for any claim to limit the age of drivers of semi-autonomous vehicles

    Bipedal steps in the development of rhythmic behavior in humans

    No full text
    We contrast two related hypotheses of the evolution of dance: H1: Maternal bipedal walking influenced the fetal experience of sound and associated movement patterns; H2: The human transition to bipedal gait produced more isochronous/predictable locomotion sound resulting in early music-like behavior associated with the acoustic advantages conferred by moving bipedally in pace. The cadence of walking is around 120 beats per minute, similar to the tempo of dance and music. Human walking displays long-term constancies. Dyads often subconsciously synchronize steps. The major amplitude component of the step is a distinctly produced beat. Human locomotion influences, and interacts with, emotions, and passive listening to music activates brain motor areas. Across dance-genres the footwork is most often performed in time to the musical beat. Brain development is largely shaped by early sensory experience, with hearing developed from week 18 of gestation. Newborns reacts to sounds, melodies, and rhythmic poems to which they have been exposed in utero. If the sound and vibrations produced by footfalls of a walking mother are transmitted to the fetus in coordination with the cadence of the motion, a connection between isochronous sound and rhythmical movement may be developed. Rhythmical sounds of the human mother locomotion differ substantially from that of nonhuman primates, while the maternal heartbeat heard is likely to have a similar isochronous character across primates, suggesting a relatively more influential role of footfall in the development of rhythmic/musical abilities in humans. Associations of gait, music, and dance are numerous. The apparent absence of musical and rhythmic abilities in nonhuman primates, which display little bipedal locomotion, corroborates that bipedal gait may be linked to the development of rhythmic abilities in humans. Bipedal stimuli in utero may primarily boost the ontogenetic development. The acoustical advantage hypothesis proposes a mechanism in the phylogenetic development

    Agent Street: An Environment for Exploring Agent-Based Models in Second Life

    Get PDF
    Urban models can be seen on a continuum between iconic and symbolic. Generally speaking, iconic models are physical versions of the real world at some scaled down representation, while symbolic models represent the system in terms of the way they function replacing the physical or material system by some logical and/or mathematical formulae. Traditionally iconic and symbolic models were distinct classes of model but due to the rise of digital computing the distinction between the two is becoming blurred, with symbolic models being embedded into iconic models. However, such models tend to be single user. This paper demonstrates how 3D symbolic models in the form of agent-based simulations can be embedded into iconic models using the multi-user virtual world of Second Life. Furthermore, the paper demonstrates Second Life\'s potential for social science simulation. To demonstrate this, we first introduce Second Life and provide two exemplar models; Conway\'s Game of Life, and Schelling\'s Segregation Model which highlight how symbolic models can be viewed in an iconic environment. We then present a simple pedestrian evacuation model which merges the iconic and symbolic together and extends the model to directly incorporate avatars and agents in the same environment illustrating how \'real\' participants can influence simulation outcomes. Such examples demonstrate the potential for creating highly visual, immersive, interactive agent-based models for social scientists in multi-user real time virtual worlds. The paper concludes with some final comments on problems with representing models in current virtual worlds and future avenues of research.Agent-Based Modelling, Pedestrian Evacuation, Segregation, Virtual Worlds, Second Life
    • …
    corecore