Search CORE

1,965 research outputs found

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Author: Heittola Toni
Huttunen Heikki
Parascandolo Giambattista
Virtanen Tuomas
Çakır Emre
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/02/2017
Field of study

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.Comment: Accepted for IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Sound Scene and Event Analysi

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Deep Room Recognition Using Inaudible Echos

Author: Gu Chaojie
Song Qun
Tan Rui
Publication venue
Publication date: 01/01/2018
Field of study

Recent years have seen the increasing need of location awareness by mobile applications. This paper presents a room-level indoor localization approach based on the measured room's echos in response to a two-millisecond single-tone inaudible chirp emitted by a smartphone's loudspeaker. Different from other acoustics-based room recognition systems that record full-spectrum audio for up to ten seconds, our approach records audio in a narrow inaudible band for 0.1 seconds only to preserve the user's privacy. However, the short-time and narrowband audio signal carries limited information about the room's characteristics, presenting challenges to accurate room recognition. This paper applies deep learning to effectively capture the subtle fingerprints in the rooms' acoustic responses. Our extensive experiments show that a two-layer convolutional neural network fed with the spectrogram of the inaudible echos achieve the best performance, compared with alternative designs using other raw data formats and deep models. Based on this result, we design a RoomRecognize cloud service and its mobile client library that enable the mobile application developers to readily implement the room recognition functionality without resorting to any existing infrastructures and add-on hardware. Extensive evaluation shows that RoomRecognize achieves 99.7%, 97.7%, 99%, and 89% accuracy in differentiating 22 and 50 residential/office rooms, 19 spots in a quiet museum, and 15 spots in a crowded museum, respectively. Compared with the state-of-the-art approaches based on support vector machine, RoomRecognize significantly improves the Pareto frontier of recognition accuracy versus robustness against interfering sounds (e.g., ambient music).Comment: 29 page

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

IntoxiGait Deep Learning

Author: Bremner Joseph S
Cheung Nicholas Gwan
Huang Sam
Lam Quoc Ho
Publication venue: Digital WPI
Publication date: 23/03/2018
Field of study

Alcohol abuse has been a pervasive problem worldwide, causing 88,000 annual deaths. Recently, several projects have attempted to estimate a users level of intoxication by measuring gait using mobile sensors. The goal of this project was to compare a deep learning approach to previous methods to predict the blood alcohol concentration of a user by training a convolutional neural network and creating a mobile app which could accurately determine intoxication level. We gathered data from 38 participants over the course of 12 weeks, collecting accelerometer and gyroscope data simultaneously from both a smartwatch and smartphone. Our neural networks accuracy is roughly 64% on the test set and 69% on the training set into 5 BAC ranges for an input containing two seconds of data

DigitalCommons@WPI