90,966 research outputs found
Rejecting noise in Baikal-GVD data with neural networks
Baikal-GVD is a large ( 1 km) underwater neutrino telescope
installed in the fresh waters of Lake Baikal. The deep lake water environment
is pervaded by background light, which produces detectable signals in the
Baikal-GVD photosensors. We introduce a neural network for an efficient
separation of these noise hits from the signal ones, stemming from the
propagation of relativistic particles through the detector. The neural network
has a U-net like architecture and employs temporal (causal) structure of
events. On Monte-Carlo simulated data, it reaches 99% signal purity (precision)
and 98% survival efficiency (recall). The benefits of using neural network for
data analysis are discussed, and other possible architectures of neural
networks, including graph based, are examined
Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation
Recently, frequency domain all-neural beamforming methods have achieved
remarkable progress for multichannel speech separation. In parallel, the
integration of time domain network structure and beamforming also gains
significant attention. This study proposes a novel all-neural beamforming
method in time domain and makes an attempt to unify the all-neural beamforming
pipelines for time domain and frequency domain multichannel speech separation.
The proposed model consists of two modules: separation and beamforming. Both
modules perform temporal-spectral-spatial modeling and are trained from
end-to-end using a joint loss function. The novelty of this study lies in two
folds. Firstly, a time domain directional feature conditioned on the direction
of the target speaker is proposed, which can be jointly optimized within the
time domain architecture to enhance target signal estimation. Secondly, an
all-neural beamforming network in time domain is designed to refine the
pre-separated results. This module features with parametric time-variant
beamforming coefficient estimation, without explicitly following the derivation
of optimal filters that may lead to an upper bound. The proposed method is
evaluated on simulated reverberant overlapped speech data derived from the
AISHELL-1 corpus. Experimental results demonstrate significant performance
improvements over frequency domain state-of-the-arts, ideal magnitude masks and
existing time domain neural beamforming methods
A Digital Neuromorphic Architecture Efficiently Facilitating Complex Synaptic Response Functions Applied to Liquid State Machines
Information in neural networks is represented as weighted connections, or
synapses, between neurons. This poses a problem as the primary computational
bottleneck for neural networks is the vector-matrix multiply when inputs are
multiplied by the neural network weights. Conventional processing architectures
are not well suited for simulating neural networks, often requiring large
amounts of energy and time. Additionally, synapses in biological neural
networks are not binary connections, but exhibit a nonlinear response function
as neurotransmitters are emitted and diffuse between neurons. Inspired by
neuroscience principles, we present a digital neuromorphic architecture, the
Spiking Temporal Processing Unit (STPU), capable of modeling arbitrary complex
synaptic response functions without requiring additional hardware components.
We consider the paradigm of spiking neurons with temporally coded information
as opposed to non-spiking rate coded neurons used in most neural networks. In
this paradigm we examine liquid state machines applied to speech recognition
and show how a liquid state machine with temporal dynamics maps onto the
STPU-demonstrating the flexibility and efficiency of the STPU for instantiating
neural algorithms.Comment: 8 pages, 4 Figures, Preprint of 2017 IJCN
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
In deep neural networks with convolutional layers, each layer typically has
fixed-size/single-resolution receptive field (RF). Convolutional layers with a
large RF capture global information from the input features, while layers with
small RF size capture local details with high resolution from the input
features. In this work, we introduce novel deep multi-resolution fully
convolutional neural networks (MR-FCNN), where each layer has different RF
sizes to extract multi-resolution features that capture the global and local
details information from its input features. The proposed MR-FCNN is applied to
separate a target audio source from a mixture of many audio sources.
Experimental results show that using MR-FCNN improves the performance compared
to feedforward deep neural networks (DNNs) and single resolution deep fully
convolutional neural networks (FCNNs) on the audio source separation problem.Comment: arXiv admin note: text overlap with arXiv:1703.0801
- …