783 research outputs found
SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification
Acoustic Scene Classification (ASC) is one of the core research problems in
the field of Computational Sound Scene Analysis. In this work, we present
SubSpectralNet, a novel model which captures discriminative features by
incorporating frequency band-level differences to model soundscapes. Using
mel-spectrograms, we propose the idea of using band-wise crops of the input
time-frequency representations and train a convolutional neural network (CNN)
on the same. We also propose a modification in the training method for more
efficient learning of the CNN models. We first give a motivation for using
sub-spectrograms by giving intuitive and statistical analyses and finally we
develop a sub-spectrogram based CNN architecture for ASC. The system is
evaluated on the public ASC development dataset provided for the "Detection and
Classification of Acoustic Scenes and Events" (DCASE) 2018 Challenge. Our best
model achieves an improvement of +14% in terms of classification accuracy with
respect to the DCASE 2018 baseline system. Code and figures are available at
https://github.com/ssrp/SubSpectralNetComment: Accepted to IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP) 201
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Classification of Arrhythmia by Using Deep Learning with 2-D ECG Spectral Image Representation
The electrocardiogram (ECG) is one of the most extensively employed signals
used in the diagnosis and prediction of cardiovascular diseases (CVDs). The ECG
signals can capture the heart's rhythmic irregularities, commonly known as
arrhythmias. A careful study of ECG signals is crucial for precise diagnoses of
patients' acute and chronic heart conditions. In this study, we propose a
two-dimensional (2-D) convolutional neural network (CNN) model for the
classification of ECG signals into eight classes; namely, normal beat,
premature ventricular contraction beat, paced beat, right bundle branch block
beat, left bundle branch block beat, atrial premature contraction beat,
ventricular flutter wave beat, and ventricular escape beat. The one-dimensional
ECG time series signals are transformed into 2-D spectrograms through
short-time Fourier transform. The 2-D CNN model consisting of four
convolutional layers and four pooling layers is designed for extracting robust
features from the input spectrograms. Our proposed methodology is evaluated on
a publicly available MIT-BIH arrhythmia dataset. We achieved a state-of-the-art
average classification accuracy of 99.11\%, which is better than those of
recently reported results in classifying similar types of arrhythmias. The
performance is significant in other indices as well, including sensitivity and
specificity, which indicates the success of the proposed method.Comment: 14 pages, 5 figures, accepted for future publication in Remote
Sensing MDPI Journa
Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?
When convolutional neural networks are used to tackle learning problems based
on music or, more generally, time series data, raw one-dimensional data are
commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients,
which are then used as input to the actual neural network. In this
contribution, we investigate, both theoretically and experimentally, the
influence of this pre-processing step on the network's performance and pose the
question, whether replacing it by applying adaptive or learned filters directly
to the raw data, can improve learning success. The theoretical results show
that approximately reproducing mel-spectrogram coefficients by applying
adaptive filters and subsequent time-averaging is in principle possible. We
also conducted extensive experimental work on the task of singing voice
detection in music. The results of these experiments show that for
classification based on Convolutional Neural Networks the features obtained
from adaptive filter banks followed by time-averaging perform better than the
canonical Fourier-transform-based mel-spectrogram coefficients. Alternative
adaptive approaches with center frequencies or time-averaging lengths learned
from training data perform equally well.Comment: Completely revised version; 21 pages, 4 figure
- …