5,435 research outputs found
Bi-class classification of humpback whale sound units against complex background noise with Deep Convolution Neural Network
Automatically detecting sound units of humpback whales in complex
time-varying background noises is a current challenge for scientists. In this
paper, we explore the applicability of Convolution Neural Network (CNN) method
for this task. In the evaluation stage, we present 6 bi-class classification
experimentations of whale sound detection against different background noise
types (e.g., rain, wind). In comparison to classical FFT-based representation
like spectrograms, we showed that the use of image-based pretrained CNN
features brought higher performance to classify whale sounds and background
noise.Comment: arXiv admin note: text overlap with arXiv:1702.02741 by other author
Deep Cross-Modal Audio-Visual Generation
Cross-modal audio-visual perception has been a long-lasting topic in
psychology and neurology, and various studies have discovered strong
correlations in human perception of auditory and visual stimuli. Despite works
in computational multimodal modeling, the problem of cross-modal audio-visual
generation has not been systematically studied in the literature. In this
paper, we make the first attempt to solve this cross-modal generation problem
leveraging the power of deep generative adversarial training. Specifically, we
use conditional generative adversarial networks to achieve cross-modal
audio-visual generation of musical performances. We explore different encoding
methods for audio and visual signals, and work on two scenarios:
instrument-oriented generation and pose-oriented generation. Being the first to
explore this new problem, we compose two new datasets with pairs of images and
sounds of musical performances of different instruments. Our experiments using
both classification and human evaluations demonstrate that our model has the
ability to generate one modality, i.e., audio/visual, from the other modality,
i.e., visual/audio, to a good extent. Our experiments on various design choices
along with the datasets will facilitate future research in this new problem
space
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Quantum computing and the brain: quantum nets, dessins d'enfants and neural networks
In this paper, we will discuss a formal link between neural networks and
quantum computing. For that purpose we will present a simple model for the
description of the neural network by forming sub-graphs of the whole network
with the same or a similar state. We will describe the interaction between
these areas by closed loops, the feedback loops. The change of the graph is
given by the deformations of the loops. This fact can be mathematically
formalized by the fundamental group of the graph. Furthermore the neuron has
two basic states (ground state) and (excited state).
The whole state of an area of neurons is the linear combination of the two
basic state with complex coefficients representing the signals (with 3
Parameters: amplitude, frequency and phase) along the neurons. Then it can be
shown that the set of all signals forms a manifold (character variety) and all
properties of the network must be encoded in this manifold. In the paper, we
will discuss how to interpret learning and intuition in this model. Using the
Morgan-Shalen compactification, the limit for signals with large amplitude can
be analyzed by using quasi-Fuchsian groups as represented by dessins d'enfants
(graphs to analyze Riemannian surfaces). As shown by Planat and collaborators,
these dessins d'enfants are a direct bridge to (topological) quantum computing
with permutation groups. The normalization of the signal reduces to the group
and the whole model to a quantum network. Then we have a direct
connection to quantum circuits. This network can be transformed into operations
on tensor networks. Formally we will obtain a link between machine learning and
Quantum computing.Comment: 17 pages, 3 Figures, accepted for the proceedings of the QTech 2018
conference (September 2018, Paris
Mosquito Detection with Neural Networks: The Buzz of Deep Learning
Many real-world time-series analysis problems are characterised by scarce
data. Solutions typically rely on hand-crafted features extracted from the time
or frequency domain allied with classification or regression engines which
condition on this (often low-dimensional) feature vector. The huge advances
enjoyed by many application domains in recent years have been fuelled by the
use of deep learning architectures trained on large data sets. This paper
presents an application of deep learning for acoustic event detection in a
challenging, data-scarce, real-world problem. Our candidate challenge is to
accurately detect the presence of a mosquito from its acoustic signature. We
develop convolutional neural networks (CNNs) operating on wavelet
transformations of audio recordings. Furthermore, we interrogate the network's
predictive power by visualising statistics of network-excitatory samples. These
visualisations offer a deep insight into the relative informativeness of
components in the detection problem. We include comparisons with conventional
classifiers, conditioned on both hand-tuned and generic features, to stress the
strength of automatic deep feature learning. Detection is achieved with
performance metrics significantly surpassing those of existing algorithmic
methods, as well as marginally exceeding those attained by individual human
experts.Comment: For data and software related to this paper, see
http://humbug.ac.uk/kiskin2017/. Submitted as a conference paper to ECML 201
- âŠ