4,947 research outputs found

    Modeling the auditory scene: predictive regularity representations and perceptual objects

    Get PDF
    Predictive processing of information is essential for goal directed behavior. We offer an account of auditory perception suggesting that representations of predictable patterns, or ‘regularities’, extracted from the incoming sounds serve as auditory perceptual objects. The auditory system continuously searches for regularities within the acoustic signal. Primitive regularities may be encoded by neurons adapting their response to specific sounds. Such neurons have been observed in many parts of the auditory system. Representations of the detected regularities produce predictions of upcoming sounds as well as alternative solutions for parsing the composite input into coherent sequences potentially emitted by putative sound sources. Accuracy of the predictions can be utilized for selecting the most likely interpretation of the auditory input. Thus in our view, perception generates hypotheses about the causal structure of the world

    UbiEar: Bringing location-independent sound awareness to the hard-of-hearing people with smartphones

    Get PDF
    Non-speech sound-awareness is important to improve the quality of life for the deaf and hard-of-hearing (DHH) people. DHH people, especially the young, are not always satisfied with their hearing aids. According to the interviews with 60 young hard-of-hearing students, a ubiquitous sound-awareness tool for emergency and social events that works in diverse environments is desired. In this paper, we design UbiEar, a smartphone-based acoustic event sensing and notification system. Core techniques in UbiEar are a light-weight deep convolution neural network to enable location-independent acoustic event recognition on commodity smartphons, and a set of mechanisms for prompt and energy-efficient acoustic sensing. We conducted both controlled experiments and user studies with 86 DHH students and showed that UbiEar can assist the young DHH students in awareness of important acoustic events in their daily life.</jats:p

    Multi-Modal Perception for Selective Rendering

    Get PDF
    A major challenge in generating high-fidelity virtual environments (VEs) is to be able to provide realism at interactive rates. The high-fidelity simulation of light and sound is still unachievable in real-time as such physical accuracy is very computationally demanding. Only recently has visual perception been used in high-fidelity rendering to improve performance by a series of novel exploitations; to render parts of the scene that are not currently being attended to by the viewer at a much lower quality without the difference being perceived. This paper investigates the effect spatialised directional sound has on the visual attention of a user towards rendered images. These perceptual artefacts are utilised in selective rendering pipelines via the use of multi-modal maps. The multi-modal maps are tested through psychophysical experiments to examine their applicability to selective rendering algorithms, with a series of fixed cost rendering functions, and are found to perform significantly better than only using image saliency maps that are naively applied to multi-modal virtual environments

    On the Perceptual Organization of Speech

    Get PDF
    A general account of auditory perceptual organization has developed in the past 2 decades. It relies on primitive devices akin to the Gestalt principles of organization to assign sensory elements to probable groupings and invokes secondary schematic processes to confirm or to repair the possible organization. Although this conceptualization is intended to apply universally, the variety and arrangement of acoustic constituents of speech violate Gestalt principles at numerous junctures, cohering perceptually, nonetheless. The authors report 3 experiments on organization in phonetic perception, using sine wave synthesis to evade the Gestalt rules and the schematic processes alike. These findings falsify a general auditory account, showing that phonetic perceptual organization is achieved by specific sensitivity to the acoustic modulations characteristic of speech signals

    Vision-Guided Robot Hearing

    Get PDF
    International audienceNatural human-robot interaction (HRI) in complex and unpredictable environments is important with many potential applicatons. While vision-based HRI has been thoroughly investigated, robot hearing and audio-based HRI are emerging research topics in robotics. In typical real-world scenarios, humans are at some distance from the robot and hence the sensory (microphone) data are strongly impaired by background noise, reverberations and competing auditory sources. In this context, the detection and localization of speakers plays a key role that enables several tasks, such as improving the signal-to-noise ratio for speech recognition, speaker recognition, speaker tracking, etc. In this paper we address the problem of how to detect and localize people that are both seen and heard. We introduce a hybrid deterministic/probabilistic model. The deterministic component allows us to map 3D visual data onto an 1D auditory space. The probabilistic component of the model enables the visual features to guide the grouping of the auditory features in order to form audiovisual (AV) objects. The proposed model and the associated algorithms are implemented in real-time (17 FPS) using a stereoscopic camera pair and two microphones embedded into the head of the humanoid robot NAO. We perform experiments with (i)~synthetic data, (ii)~publicly available data gathered with an audiovisual robotic head, and (iii)~data acquired using the NAO robot. The results validate the approach and are an encouragement to investigate how vision and hearing could be further combined for robust HRI

    Auditory Perceptual Organisation

    Get PDF
    Traveling pressure waves (ie. sounds) are produced by the movements or actions of objects. So sounds primarily convey information about what is happening in the environment. In addition, some information about the structure of the environment and the surface features of objects can be extracted by determining how the original (self-generated or exogenous) sounds are filtered or distorted by the environment (e.g. the notion of “acoustic daylight,” (Fay 2009)). In this article we consider how the auditory systems processes sound signals to extract information about the environment and the objects within it

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    RV Sonne Cruise 200, 11 Jan-11 Mar 2009. Jakarta - Jakarta

    Get PDF
    All plate boundaries are divided into segments - pieces of fault that are distinct from oneanother, either separated by gaps or with different orientations. The maximum size of anearthquake on a fault system is controlled by the degree to which the propagating rupture cancross the boundaries between such segments. A large earthquake may rupture a whole segmentof plate boundary, but a great earthquake usually ruptures more than one segment at once.The December 26th 2004 MW 9.3 earthquake and the March 28th 2005 MW 8.7 earthquakeruptured, respectively, 1200–1300 km and 300–400 km of the subduction boundary betweenthe Indian-Australian plate and the Burman and Sumatra blocks. Rupture in the 2004 eventstarted at the southern end of the fault segment, and propagated northwards. The observationthat the slip did not propagate significantly southwards in December 2004, even though themagnitude of slip was high at the southern end of the rupture strongly suggests a barrier at thatplace. Maximum slip in the March 2005 earthquake occurred within ~100 km of the barrierbetween the 2004 and 2005 ruptures, confirming both the physical importance of the barrier,and the loading of the March 2005 rupture zone by the December 2004 earthquake.The Sumatran Segmentation Project, funded by the Natural Environment Research Council(NERC), aims to characterise the boundaries between these great earthquakes (in terms of bothsubduction zone structure at scales of 101-104 m and rock physical properties), record seismicactivity, improve and link earthquake slip distribution to the structure of the subduction zoneand to determine the sedimentological record of great earthquakes (both recent and historic)along this part of the margin. The Project is focussed on the areas around two earthquakesegment boundaries: Segment Boundary 1 (SB1) between the 2004 and 2005 ruptures atSimeulue Island, and SB2 between the 2005 and smaller 1935 ruptures between Nias and theBatu Islands.Cruise SO200 is the third of three cruises which will provide a combined geophysical andgeological dataset in the source regions of the 2004 and 2005 subduction zone earthquakes.SO200 was divided into two Legs. Leg 1 (SO200-1), Jakarta to Jakarta between January 22ndand February 22nd, was composed of three main operations: longterm deployment OBSretrieval, TOBI sidescan sonar survey and coring. Leg 2 (SO200-2), Jakarta to Jakarta betweenFebruary 23rd and March 11th, was composed of two main operations: Multichannel seismicreflection (MCS) profiles and heatflow probe transects
    corecore