363 research outputs found
An optimally concentrated Gabor transform for localized time-frequency components
Gabor analysis is one of the most common instances of time-frequency signal
analysis. Choosing a suitable window for the Gabor transform of a signal is
often a challenge for practical applications, in particular in audio signal
processing. Many time-frequency (TF) patterns of different shapes may be
present in a signal and they can not all be sparsely represented in the same
spectrogram. We propose several algorithms, which provide optimal windows for a
user-selected TF pattern with respect to different concentration criteria. We
base our optimization algorithm on -norms as measure of TF spreading. For
a given number of sampling points in the TF plane we also propose optimal
lattices to be used with the obtained windows. We illustrate the potentiality
of the method on selected numerical examples
Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?
When convolutional neural networks are used to tackle learning problems based
on music or, more generally, time series data, raw one-dimensional data are
commonly pre-processed to obtain spectrogram or mel-spectrogram coefficients,
which are then used as input to the actual neural network. In this
contribution, we investigate, both theoretically and experimentally, the
influence of this pre-processing step on the network's performance and pose the
question, whether replacing it by applying adaptive or learned filters directly
to the raw data, can improve learning success. The theoretical results show
that approximately reproducing mel-spectrogram coefficients by applying
adaptive filters and subsequent time-averaging is in principle possible. We
also conducted extensive experimental work on the task of singing voice
detection in music. The results of these experiments show that for
classification based on Convolutional Neural Networks the features obtained
from adaptive filter banks followed by time-averaging perform better than the
canonical Fourier-transform-based mel-spectrogram coefficients. Alternative
adaptive approaches with center frequencies or time-averaging lengths learned
from training data perform equally well.Comment: Completely revised version; 21 pages, 4 figure
Sparse and Cosparse Audio Dequantization Using Convex Optimization
The paper shows the potential of sparsity-based methods in restoring
quantized signals. Following up on the study of Brauer et al. (IEEE ICASSP
2016), we significantly extend the range of the evaluation scenarios: we
introduce the analysis (cosparse) model, we use more effective algorithms, we
experiment with another time-frequency transform. The paper shows that the
analysis-based model performs comparably to the synthesis-model, but the Gabor
transform produces better results than the originally used cosine transform.
Last but not least, we provide codes and data in a reproducible way
Superposition frames for adaptive time-frequency analysis and fast reconstruction
In this article we introduce a broad family of adaptive, linear
time-frequency representations termed superposition frames, and show that they
admit desirable fast overlap-add reconstruction properties akin to standard
short-time Fourier techniques. This approach stands in contrast to many
adaptive time-frequency representations in the extant literature, which, while
more flexible than standard fixed-resolution approaches, typically fail to
provide efficient reconstruction and often lack the regular structure necessary
for precise frame-theoretic analysis. Our main technical contributions come
through the development of properties which ensure that this construction
provides for a numerically stable, invertible signal representation. Our
primary algorithmic contributions come via the introduction and discussion of
specific signal adaptation criteria in deterministic and stochastic settings,
based respectively on time-frequency concentration and nonstationarity
detection. We conclude with a short speech enhancement example that serves to
highlight potential applications of our approach.Comment: 16 pages, 6 figures; revised versio
Nonlinear approximation with nonstationary Gabor frames
We consider sparseness properties of adaptive time-frequency representations
obtained using nonstationary Gabor frames (NSGFs). NSGFs generalize classical
Gabor frames by allowing for adaptivity in either time or frequency. It is
known that the concept of painless nonorthogonal expansions generalizes to the
nonstationary case, providing perfect reconstruction and an FFT based
implementation for compactly supported window functions sampled at a certain
density. It is also known that for some signal classes, NSGFs with flexible
time resolution tend to provide sparser expansions than can be obtained with
classical Gabor frames. In this article we show, for the continuous case, that
sparseness of a nonstationary Gabor expansion is equivalent to smoothness in an
associated decomposition space. In this way we characterize signals with sparse
expansions relative to NSGFs with flexible time resolution. Based on this
characterization we prove an upper bound on the approximation error occurring
when thresholding the coefficients of the corresponding frame expansions. We
complement the theoretical results with numerical experiments, estimating the
rate of approximation obtained from thresholding the coefficients of both
stationary and nonstationary Gabor expansions.Comment: 19 pages, 2 figure
Introducing Latent Timbre Synthesis
We present the Latent Timbre Synthesis (LTS), a new audio synthesis method
using Deep Learning. The synthesis method allows composers and sound designers
to interpolate and extrapolate between the timbre of multiple sounds using the
latent space of audio frames. We provide the details of two Variational
Autoencoder architectures for LTS, and compare their advantages and drawbacks.
The implementation includes a fully working application with graphical user
interface, called \textit{interpolate\_two}, which enables practitioners to
explore the timbre between two audio excerpts of their selection using
interpolation and extrapolation in the latent space of audio frames. Our
implementation is open-source, and we aim to improve the accessibility of this
technology by providing a guide for users with any technical background
- …