25,231 research outputs found
Eventness: Object Detection on Spectrograms for Temporal Localization of Audio Events
In this paper, we introduce the concept of Eventness for audio event
detection, which can, in part, be thought of as an analogue to Objectness from
computer vision. The key observation behind the eventness concept is that audio
events reveal themselves as 2-dimensional time-frequency patterns with specific
textures and geometric structures in spectrograms. These time-frequency
patterns can then be viewed analogously to objects occurring in natural images
(with the exception that scaling and rotation invariance properties do not
apply). With this key observation in mind, we pose the problem of detecting
monophonic or polyphonic audio events as an equivalent visual object(s)
detection problem under partial occlusion and clutter in spectrograms. We adapt
a state-of-the-art visual object detection model to evaluate the audio event
detection task on publicly available datasets. The proposed network has
comparable results with a state-of-the-art baseline and is more robust on
minority events. Provided large-scale datasets, we hope that our proposed
conceptual model of eventness will be beneficial to the audio signal processing
community towards improving performance of audio event detection.Comment: 5 pages, 3 figures, accepted to ICASSP 201
Deep Room Recognition Using Inaudible Echos
Recent years have seen the increasing need of location awareness by mobile
applications. This paper presents a room-level indoor localization approach
based on the measured room's echos in response to a two-millisecond single-tone
inaudible chirp emitted by a smartphone's loudspeaker. Different from other
acoustics-based room recognition systems that record full-spectrum audio for up
to ten seconds, our approach records audio in a narrow inaudible band for 0.1
seconds only to preserve the user's privacy. However, the short-time and
narrowband audio signal carries limited information about the room's
characteristics, presenting challenges to accurate room recognition. This paper
applies deep learning to effectively capture the subtle fingerprints in the
rooms' acoustic responses. Our extensive experiments show that a two-layer
convolutional neural network fed with the spectrogram of the inaudible echos
achieve the best performance, compared with alternative designs using other raw
data formats and deep models. Based on this result, we design a RoomRecognize
cloud service and its mobile client library that enable the mobile application
developers to readily implement the room recognition functionality without
resorting to any existing infrastructures and add-on hardware.
Extensive evaluation shows that RoomRecognize achieves 99.7%, 97.7%, 99%, and
89% accuracy in differentiating 22 and 50 residential/office rooms, 19 spots in
a quiet museum, and 15 spots in a crowded museum, respectively. Compared with
the state-of-the-art approaches based on support vector machine, RoomRecognize
significantly improves the Pareto frontier of recognition accuracy versus
robustness against interfering sounds (e.g., ambient music).Comment: 29 page
- …