1,013 research outputs found
Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates
This paper presents a novel approach for indoor acoustic source localization
using microphone arrays and based on a Convolutional Neural Network (CNN). The
proposed solution is, to the best of our knowledge, the first published work in
which the CNN is designed to directly estimate the three dimensional position
of an acoustic source, using the raw audio signal as the input information
avoiding the use of hand crafted audio features. Given the limited amount of
available localization data, we propose in this paper a training strategy based
on two steps. We first train our network using semi-synthetic data, generated
from close talk speech recordings, and where we simulate the time delays and
distortion suffered in the signal that propagates from the source to the array
of microphones. We then fine tune this network using a small amount of real
data. Our experimental results show that this strategy is able to produce
networks that significantly improve existing localization methods based on
\textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN
method exhibits better resistance against varying gender of the speaker and
different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table
DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals
We present a Deep Differentiable Simplex Layer (DDSL) for neural networks for
geometric deep learning. The DDSL is a differentiable layer compatible with
deep neural networks for bridging simplex mesh-based geometry representations
(point clouds, line mesh, triangular mesh, tetrahedral mesh) with raster images
(e.g., 2D/3D grids). The DDSL uses Non-Uniform Fourier Transform (NUFT) to
perform differentiable, efficient, anti-aliased rasterization of simplex-based
signals. We present a complete theoretical framework for the process as well as
an efficient backpropagation algorithm. Compared to previous differentiable
renderers and rasterizers, the DDSL generalizes to arbitrary simplex degrees
and dimensions. In particular, we explore its applications to 2D shapes and
illustrate two applications of this method: (1) mesh editing and optimization
guided by neural network outputs, and (2) using DDSL for a differentiable
rasterization loss to facilitate end-to-end training of polygon generators. We
are able to validate the effectiveness of gradient-based shape optimization
with the example of airfoil optimization, and using the differentiable
rasterization loss to facilitate end-to-end training, we surpass state of the
art for polygonal image segmentation given ground-truth bounding boxes
DeepSphere: Efficient spherical Convolutional Neural Network with HEALPix sampling for cosmological applications
Convolutional Neural Networks (CNNs) are a cornerstone of the Deep Learning
toolbox and have led to many breakthroughs in Artificial Intelligence. These
networks have mostly been developed for regular Euclidean domains such as those
supporting images, audio, or video. Because of their success, CNN-based methods
are becoming increasingly popular in Cosmology. Cosmological data often comes
as spherical maps, which make the use of the traditional CNNs more complicated.
The commonly used pixelization scheme for spherical maps is the Hierarchical
Equal Area isoLatitude Pixelisation (HEALPix). We present a spherical CNN for
analysis of full and partial HEALPix maps, which we call DeepSphere. The
spherical CNN is constructed by representing the sphere as a graph. Graphs are
versatile data structures that can act as a discrete representation of a
continuous manifold. Using the graph-based representation, we define many of
the standard CNN operations, such as convolution and pooling. With filters
restricted to being radial, our convolutions are equivariant to rotation on the
sphere, and DeepSphere can be made invariant or equivariant to rotation. This
way, DeepSphere is a special case of a graph CNN, tailored to the HEALPix
sampling of the sphere. This approach is computationally more efficient than
using spherical harmonics to perform convolutions. We demonstrate the method on
a classification problem of weak lensing mass maps from two cosmological models
and compare the performance of the CNN with that of two baseline classifiers.
The results show that the performance of DeepSphere is always superior or equal
to both of these baselines. For high noise levels and for data covering only a
smaller fraction of the sphere, DeepSphere achieves typically 10% better
classification accuracy than those baselines. Finally, we show how learned
filters can be visualized to introspect the neural network.Comment: arXiv admin note: text overlap with arXiv:astro-ph/0409513 by other
author
Robust sound event detection in bioacoustic sensor networks
Bioacoustic sensors, sometimes known as autonomous recording units (ARUs),
can record sounds of wildlife over long periods of time in scalable and
minimally invasive ways. Deriving per-species abundance estimates from these
sensors requires detection, classification, and quantification of animal
vocalizations as individual acoustic events. Yet, variability in ambient noise,
both over time and across sensors, hinders the reliability of current automated
systems for sound event detection (SED), such as convolutional neural networks
(CNN) in the time-frequency domain. In this article, we develop, benchmark, and
combine several machine listening techniques to improve the generalizability of
SED models across heterogeneous acoustic environments. As a case study, we
consider the problem of detecting avian flight calls from a ten-hour recording
of nocturnal bird migration, recorded by a network of six ARUs in the presence
of heterogeneous background noise. Starting from a CNN yielding
state-of-the-art accuracy on this task, we introduce two noise adaptation
techniques, respectively integrating short-term (60 milliseconds) and long-term
(30 minutes) context. First, we apply per-channel energy normalization (PCEN)
in the time-frequency domain, which applies short-term automatic gain control
to every subband in the mel-frequency spectrogram. Secondly, we replace the
last dense layer in the network by a context-adaptive neural network (CA-NN)
layer. Combining them yields state-of-the-art results that are unmatched by
artificial data augmentation alone. We release a pre-trained version of our
best performing system under the name of BirdVoxDetect, a ready-to-use detector
of avian flight calls in field recordings.Comment: 32 pages, in English. Submitted to PLOS ONE journal in February 2019;
revised August 2019; published October 201
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool
Accelerated by the increasing attention drawn by 5G, 6G, and Internet of
Things applications, communication and sensing technologies have rapidly
evolved from millimeter-wave (mmWave) to terahertz (THz) in recent years.
Enabled by significant advancements in electromagnetic (EM) hardware, mmWave
and THz frequency regimes spanning 30 GHz to 300 GHz and 300 GHz to 3000 GHz,
respectively, can be employed for a host of applications. The main feature of
THz systems is high-bandwidth transmission, enabling ultra-high-resolution
imaging and high-throughput communications; however, challenges in both the
hardware and algorithmic arenas remain for the ubiquitous adoption of THz
technology. Spectra comprising mmWave and THz frequencies are well-suited for
synthetic aperture radar (SAR) imaging at sub-millimeter resolutions for a wide
spectrum of tasks like material characterization and nondestructive testing
(NDT). This article provides a tutorial review of systems and algorithms for
THz SAR in the near-field with an emphasis on emerging algorithms that combine
signal processing and machine learning techniques. As part of this study, an
overview of classical and data-driven THz SAR algorithms is provided, focusing
on object detection for security applications and SAR image super-resolution.
We also discuss relevant issues, challenges, and future research directions for
emerging algorithms and THz SAR, including standardization of system and
algorithm benchmarking, adoption of state-of-the-art deep learning techniques,
signal processing-optimized machine learning, and hybrid data-driven signal
processing algorithms...Comment: Submitted to Proceedings of IEE
Multiscale Mesh Deformation Component Analysis with Attention-based Autoencoders
Deformation component analysis is a fundamental problem in geometry
processing and shape understanding. Existing approaches mainly extract
deformation components in local regions at a similar scale while deformations
of real-world objects are usually distributed in a multi-scale manner. In this
paper, we propose a novel method to exact multiscale deformation components
automatically with a stacked attention-based autoencoder. The attention
mechanism is designed to learn to softly weight multi-scale deformation
components in active deformation regions, and the stacked attention-based
autoencoder is learned to represent the deformation components at different
scales. Quantitative and qualitative evaluations show that our method
outperforms state-of-the-art methods. Furthermore, with the multiscale
deformation components extracted by our method, the user can edit shapes in a
coarse-to-fine fashion which facilitates effective modeling of new shapes.Comment: 15 page
- …