1,533,362 research outputs found
Audio Inpainting
(c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published version: IEEE Transactions on Audio, Speech and Language Processing 20(3): 922-932, Mar 2012. DOI: 10.1090/TASL.2011.2168211
Virtual Audio - Three-Dimensional Audio in Virtual Environments
Three-dimensional interactive audio has a variety ofpotential uses in human-machine interfaces. After lagging seriously
behind the visual components, the importance of sound is now becoming
increas-ingly accepted.
This paper mainly discusses background and techniques to implement
three-dimensional audio in computer interfaces. A case study of a
system for three-dimensional audio, implemented by the author, is
described in great detail. The audio system was moreover integrated
with a virtual reality system and conclusions on user tests and use
of the audio system is presented along with proposals for future work
at the end of the paper.
The thesis begins with a definition of three-dimensional audio and a
survey on the human auditory system to give the reader the needed
knowledge of what three-dimensional audio is and how human auditory
perception works
An open dataset for research on audio field recording archives: freefield1010
We introduce a free and open dataset of 7690 audio clips sampled from the
field-recording tag in the Freesound audio archive. The dataset is designed for
use in research related to data mining in audio archives of field recordings /
soundscapes. Audio is standardised, and audio and metadata are Creative Commons
licensed. We describe the data preparation process, characterise the dataset
descriptively, and illustrate its use through an auto-tagging experiment
Deep Learning of Human Perception in Audio Event Classification
In this paper, we introduce our recent studies on human perception in audio
event classification by different deep learning models. In particular, the
pre-trained model VGGish is used as feature extractor to process audio data,
and DenseNet is trained by and used as feature extractor for our
electroencephalography (EEG) data. The correlation between audio stimuli and
EEG is learned in a shared space. In the experiments, we record brain
activities (EEG signals) of several subjects while they are listening to music
events of 8 audio categories selected from Google AudioSet, using a 16-channel
EEG headset with active electrodes. Our experimental results demonstrate that
i) audio event classification can be improved by exploiting the power of human
perception, and ii) the correlation between audio stimuli and EEG can be
learned to complement audio event understanding
General audio tagging with ensembling convolutional neural network and statistical features
Audio tagging aims to infer descriptive labels from audio clips. Audio
tagging is challenging due to the limited size of data and noisy labels. In
this paper, we describe our solution for the DCASE 2018 Task 2 general audio
tagging challenge. The contributions of our solution include: We investigated a
variety of convolutional neural network architectures to solve the audio
tagging task. Statistical features are applied to capture statistical patterns
of audio features to improve the classification performance. Ensemble learning
is applied to ensemble the outputs from the deep classifiers to utilize
complementary information. a sample re-weight strategy is employed for ensemble
training to address the noisy label problem. Our system achieves a mean average
precision (mAP@3) of 0.958, outperforming the baseline system of 0.704. Our
system ranked the 1st and 4th out of 558 submissions in the public and private
leaderboard of DCASE 2018 Task 2 challenge. Our codes are available at
https://github.com/Cocoxili/DCASE2018Task2/.Comment: Submitted to ICASS
- …
