5 research outputs found
Gabor frames and deep scattering networks in audio processing
This paper introduces Gabor scattering, a feature extractor based on Gabor
frames and Mallat's scattering transform. By using a simple signal model for
audio signals specific properties of Gabor scattering are studied. It is shown
that for each layer, specific invariances to certain signal characteristics
occur. Furthermore, deformation stability of the coefficient vector generated
by the feature extractor is derived by using a decoupling technique which
exploits the contractivity of general scattering networks. Deformations are
introduced as changes in spectral shape and frequency modulation. The
theoretical results are illustrated by numerical examples and experiments.
Numerical evidence is given by evaluation on a synthetic and a "real" data set,
that the invariances encoded by the Gabor scattering transform lead to higher
performance in comparison with just using Gabor transform, especially when few
training samples are available.Comment: 26 pages, 8 figures, 4 tables. Repository for reproducibility:
https://gitlab.com/hararticles/gs-gt . Keywords: machine learning; scattering
transform; Gabor transform; deep learning; time-frequency analysis; CNN.
Accepted and published after peer revisio
Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?
When convolutional neural networks are used to tackle learning problems based
on music or, more generally, time series data, raw one-dimensional data are
commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients,
which are then used as input to the actual neural network. In this
contribution, we investigate, both theoretically and experimentally, the
influence of this pre-processing step on the network's performance and pose the
question, whether replacing it by applying adaptive or learned filters directly
to the raw data, can improve learning success. The theoretical results show
that approximately reproducing mel-spectrogram coefficients by applying
adaptive filters and subsequent time-averaging is in principle possible. We
also conducted extensive experimental work on the task of singing voice
detection in music. The results of these experiments show that for
classification based on Convolutional Neural Networks the features obtained
from adaptive filter banks followed by time-averaging perform better than the
canonical Fourier-transform-based mel-spectrogram coefficients. Alternative
adaptive approaches with center frequencies or time-averaging lengths learned
from training data perform equally well.Comment: Completely revised version; 21 pages, 4 figure
Gabor Frames and Deep Scattering Networks in Audio Processing
This paper introduces Gabor scattering, a feature extractor based on Gabor frames and Mallat’s scattering transform. By using a simple signal model for audio signals, specific properties of Gabor scattering are studied. It is shown that, for each layer, specific invariances to certain signal characteristics occur. Furthermore, deformation stability of the coefficient vector generated by the feature extractor is derived by using a decoupling technique which exploits the contractivity of general scattering networks. Deformations are introduced as changes in spectral shape and frequency modulation. The theoretical results are illustrated by numerical examples and experiments. Numerical evidence is given by evaluation on a synthetic and a “real” dataset, that the invariances encoded by the Gabor scattering transform lead to higher performance in comparison with just using Gabor transform, especially when few training samples are available.© 2019 by the author