98 research outputs found
Histogram of gradients of Time-Frequency Representations for Audio scene detection
This paper addresses the problem of audio scenes classification and
contributes to the state of the art by proposing a novel feature. We build this
feature by considering histogram of gradients (HOG) of time-frequency
representation of an audio scene. Contrarily to classical audio features like
MFCC, we make the hypothesis that histogram of gradients are able to encode
some relevant informations in a time-frequency {representation:} namely, the
local direction of variation (in time and frequency) of the signal spectral
power. In addition, in order to gain more invariance and robustness, histogram
of gradients are locally pooled. We have evaluated the relevance of {the novel
feature} by comparing its performances with state-of-the-art competitors, on
several datasets, including a novel one that we provide, as part of our
contribution. This dataset, that we make publicly available, involves
classes and contains about minutes of audio scene recording. We thus
believe that it may be the next standard dataset for evaluating audio scene
classification algorithms. Our comparison results clearly show that our
HOG-based features outperform its competitor
DC Proximal Newton for Non-Convex Optimization Problems
We introduce a novel algorithm for solving learning problems where both the
loss function and the regularizer are non-convex but belong to the class of
difference of convex (DC) functions. Our contribution is a new general purpose
proximal Newton algorithm that is able to deal with such a situation. The
algorithm consists in obtaining a descent direction from an approximation of
the loss function and then in performing a line search to ensure sufficient
descent. A theoretical analysis is provided showing that the iterates of the
proposed algorithm {admit} as limit points stationary points of the DC
objective function. Numerical experiments show that our approach is more
efficient than current state of the art for a problem with a convex loss
functions and non-convex regularizer. We have also illustrated the benefit of
our algorithm in high-dimensional transductive learning problem where both loss
function and regularizers are non-convex
Importance sampling strategy for non-convex randomized block-coordinate descent
As the number of samples and dimensionality of optimization problems related
to statistics an machine learning explode, block coordinate descent algorithms
have gained popularity since they reduce the original problem to several
smaller ones. Coordinates to be optimized are usually selected randomly
according to a given probability distribution. We introduce an importance
sampling strategy that helps randomized coordinate descent algorithms to focus
on blocks that are still far from convergence. The framework applies to
problems composed of the sum of two possibly non-convex terms, one being
separable and non-smooth. We have compared our algorithm to a full gradient
proximal approach as well as to a randomized block coordinate algorithm that
considers uniform sampling and cyclic block coordinate descent. Experimental
evidences show the clear benefit of using an importance sampling strategy
Fast object detection in compressed JPEG Images
Object detection in still images has drawn a lot of attention over past few
years, and with the advent of Deep Learning impressive performances have been
achieved with numerous industrial applications. Most of these deep learning
models rely on RGB images to localize and identify objects in the image.
However in some application scenarii, images are compressed either for storage
savings or fast transmission. Therefore a time consuming image decompression
step is compulsory in order to apply the aforementioned deep models. To
alleviate this drawback, we propose a fast deep architecture for object
detection in JPEG images, one of the most widespread compression format. We
train a neural network to detect objects based on the blockwise DCT (discrete
cosine transform) coefficients {issued from} the JPEG compression algorithm. We
modify the well-known Single Shot multibox Detector (SSD) by replacing its
first layers with one convolutional layer dedicated to process the DCT inputs.
Experimental evaluations on PASCAL VOC and industrial dataset comprising images
of road traffic surveillance show that the model is about faster than
regular SSD with promising detection performances. To the best of our
knowledge, this paper is the first to address detection in compressed JPEG
images
Dilated Spatial Generative Adversarial Networks for Ergodic Image Generation
Generative models have recently received renewed attention as a result of
adversarial learning. Generative adversarial networks consist of samples
generation model and a discrimination model able to distinguish between genuine
and synthetic samples. In combination with convolutional (for the
discriminator) and de-convolutional (for the generator) layers, they are
particularly suitable for image generation, especially of natural scenes.
However, the presence of fully connected layers adds global dependencies in the
generated images. This may lead to high and global variations in the generated
sample for small local variations in the input noise. In this work we propose
to use architec-tures based on fully convolutional networks (including among
others dilated layers), architectures specifically designed to generate
globally ergodic images, that is images without global dependencies. Conducted
experiments reveal that these architectures are well suited for generating
natural textures such as geologic structures
Foreground-Background Ambient Sound Scene Separation
Ambient sound scenes typically comprise multiple short events occurring on
top of a somewhat stationary background. We consider the task of separating
these events from the background, which we call foreground-background ambient
sound scene separation. We propose a deep learning-based separation framework
with a suitable feature normaliza-tion scheme and an optional auxiliary network
capturing the background statistics, and we investigate its ability to handle
the great variety of sound classes encountered in ambient sound scenes, which
have often not been seen in training. To do so, we create single-channel
foreground-background mixtures using isolated sounds from the DESED and
Audioset datasets, and we conduct extensive experiments with mixtures of seen
or unseen sound classes at various signal-to-noise ratios. Our experimental
findings demonstrate the generalization ability of the proposed approach
- …