5,435 research outputs found

    Bi-class classification of humpback whale sound units against complex background noise with Deep Convolution Neural Network

    Full text link
    Automatically detecting sound units of humpback whales in complex time-varying background noises is a current challenge for scientists. In this paper, we explore the applicability of Convolution Neural Network (CNN) method for this task. In the evaluation stage, we present 6 bi-class classification experimentations of whale sound detection against different background noise types (e.g., rain, wind). In comparison to classical FFT-based representation like spectrograms, we showed that the use of image-based pretrained CNN features brought higher performance to classify whale sounds and background noise.Comment: arXiv admin note: text overlap with arXiv:1702.02741 by other author

    Deep Cross-Modal Audio-Visual Generation

    Full text link
    Cross-modal audio-visual perception has been a long-lasting topic in psychology and neurology, and various studies have discovered strong correlations in human perception of auditory and visual stimuli. Despite works in computational multimodal modeling, the problem of cross-modal audio-visual generation has not been systematically studied in the literature. In this paper, we make the first attempt to solve this cross-modal generation problem leveraging the power of deep generative adversarial training. Specifically, we use conditional generative adversarial networks to achieve cross-modal audio-visual generation of musical performances. We explore different encoding methods for audio and visual signals, and work on two scenarios: instrument-oriented generation and pose-oriented generation. Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments. Our experiments using both classification and human evaluations demonstrate that our model has the ability to generate one modality, i.e., audio/visual, from the other modality, i.e., visual/audio, to a good extent. Our experiments on various design choices along with the datasets will facilitate future research in this new problem space

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Quantum computing and the brain: quantum nets, dessins d'enfants and neural networks

    Full text link
    In this paper, we will discuss a formal link between neural networks and quantum computing. For that purpose we will present a simple model for the description of the neural network by forming sub-graphs of the whole network with the same or a similar state. We will describe the interaction between these areas by closed loops, the feedback loops. The change of the graph is given by the deformations of the loops. This fact can be mathematically formalized by the fundamental group of the graph. Furthermore the neuron has two basic states ∣0⟩|0\rangle (ground state) and ∣1⟩|1\rangle (excited state). The whole state of an area of neurons is the linear combination of the two basic state with complex coefficients representing the signals (with 3 Parameters: amplitude, frequency and phase) along the neurons. Then it can be shown that the set of all signals forms a manifold (character variety) and all properties of the network must be encoded in this manifold. In the paper, we will discuss how to interpret learning and intuition in this model. Using the Morgan-Shalen compactification, the limit for signals with large amplitude can be analyzed by using quasi-Fuchsian groups as represented by dessins d'enfants (graphs to analyze Riemannian surfaces). As shown by Planat and collaborators, these dessins d'enfants are a direct bridge to (topological) quantum computing with permutation groups. The normalization of the signal reduces to the group SU(2)SU(2) and the whole model to a quantum network. Then we have a direct connection to quantum circuits. This network can be transformed into operations on tensor networks. Formally we will obtain a link between machine learning and Quantum computing.Comment: 17 pages, 3 Figures, accepted for the proceedings of the QTech 2018 conference (September 2018, Paris

    Mosquito Detection with Neural Networks: The Buzz of Deep Learning

    Full text link
    Many real-world time-series analysis problems are characterised by scarce data. Solutions typically rely on hand-crafted features extracted from the time or frequency domain allied with classification or regression engines which condition on this (often low-dimensional) feature vector. The huge advances enjoyed by many application domains in recent years have been fuelled by the use of deep learning architectures trained on large data sets. This paper presents an application of deep learning for acoustic event detection in a challenging, data-scarce, real-world problem. Our candidate challenge is to accurately detect the presence of a mosquito from its acoustic signature. We develop convolutional neural networks (CNNs) operating on wavelet transformations of audio recordings. Furthermore, we interrogate the network's predictive power by visualising statistics of network-excitatory samples. These visualisations offer a deep insight into the relative informativeness of components in the detection problem. We include comparisons with conventional classifiers, conditioned on both hand-tuned and generic features, to stress the strength of automatic deep feature learning. Detection is achieved with performance metrics significantly surpassing those of existing algorithmic methods, as well as marginally exceeding those attained by individual human experts.Comment: For data and software related to this paper, see http://humbug.ac.uk/kiskin2017/. Submitted as a conference paper to ECML 201
    • 

    corecore