1,047 research outputs found
Deep Speech Denoising with Vector Space Projections
We propose an algorithm to denoise speakers from a single microphone in the
presence of non-stationary and dynamic noise. Our approach is inspired by the
recent success of neural network models separating speakers from other speakers
and singers from instrumental accompaniment. Unlike prior art, we leverage
embedding spaces produced with source-contrastive estimation, a technique
derived from negative sampling techniques in natural language processing, while
simultaneously obtaining a continuous inference mask. Our embedding space
directly optimizes for the discrimination of speaker and noise by jointly
modeling their characteristics. This space is generalizable in that it is not
speaker or noise specific and is capable of denoising speech even if the model
has not seen the speaker in the training set. Parameters are trained with dual
objectives: one that promotes a selective bandpass filter that eliminates noise
at time-frequency positions that exceed signal power, and another that
proportionally splits time-frequency content between signal and noise. We
compare to state of the art algorithms as well as traditional sparse
non-negative matrix factorization solutions. The resulting algorithm avoids
severe computational burden by providing a more intuitive and easily optimized
approach, while achieving competitive accuracy.Comment: arXiv admin note: text overlap with arXiv:1705.0466
One-dimensional Deep Image Prior for Time Series Inverse Problems
We extend the Deep Image Prior (DIP) framework to one-dimensional signals.
DIP is using a randomly initialized convolutional neural network (CNN) to solve
linear inverse problems by optimizing over weights to fit the observed
measurements. Our main finding is that properly tuned one-dimensional
convolutional architectures provide an excellent Deep Image Prior for various
types of temporal signals including audio, biological signals, and sensor
measurements. We show that our network can be used in a variety of recovery
tasks including missing value imputation, blind denoising, and compressed
sensing from random Gaussian projections. The key challenge is how to avoid
overfitting by carefully tuning early stopping, total variation, and weight
decay regularization. Our method requires up to 4 times fewer measurements than
Lasso and outperforms NLM-VAMP for random Gaussian measurements on audio
signals, has similar imputation performance to a Kalman state-space model on a
variety of data, and outperforms wavelet filtering in removing additive noise
from air-quality sensor readings
Monaural Audio Speaker Separation with Source Contrastive Estimation
We propose an algorithm to separate simultaneously speaking persons from each
other, the "cocktail party problem", using a single microphone. Our approach
involves a deep recurrent neural networks regression to a vector space that is
descriptive of independent speakers. Such a vector space can embed empirically
determined speaker characteristics and is optimized by distinguishing between
speaker masks. We call this technique source-contrastive estimation. The
methodology is inspired by negative sampling, which has seen success in natural
language processing, where an embedding is learned by correlating and
de-correlating a given input vector with output weights. Although the matrix
determined by the output weights is dependent on a set of known speakers, we
only use the input vectors during inference. Doing so will ensure that source
separation is explicitly speaker-independent. Our approach is similar to recent
deep neural network clustering and permutation-invariant training research; we
use weighted spectral features and masks to augment individual speaker
frequencies while filtering out other speakers. We avoid, however, the severe
computational burden of other approaches with our technique. Furthermore, by
training a vector space rather than combinations of different speakers or
differences thereof, we avoid the so-called permutation problem during
training. Our algorithm offers an intuitive, computationally efficient response
to the cocktail party problem, and most importantly boasts better empirical
performance than other current techniques
RARE: Image Reconstruction using Deep Priors Learned without Ground Truth
Regularization by denoising (RED) is an image reconstruction framework that
uses an image denoiser as a prior. Recent work has shown the state-of-the-art
performance of RED with learned denoisers corresponding to pre-trained
convolutional neural nets (CNNs). In this work, we propose to broaden the
current denoiser-centric view of RED by considering priors corresponding to
networks trained for more general artifact-removal. The key benefit of the
proposed family of algorithms, called regularization by artifact-removal
(RARE), is that it can leverage priors learned on datasets containing only
undersampled measurements. This makes RARE applicable to problems where it is
practically impossible to have fully-sampled groundtruth data for training. We
validate RARE on both simulated and experimentally collected data by
reconstructing a free-breathing whole-body 3D MRIs into ten respiratory phases
from heavily undersampled k-space measurements. Our results corroborate the
potential of learning regularizers for iterative inversion directly on
undersampled and noisy measurements.Comment: In press for IEEE Journal of Special Topics in Signal Processin
Regularizing Autoencoder-Based Matrix Completion Models via Manifold Learning
Autoencoders are popular among neural-network-based matrix completion models
due to their ability to retrieve potential latent factors from the partially
observed matrices. Nevertheless, when training data is scarce their performance
is significantly degraded due to overfitting. In this paper, we mit- igate
overfitting with a data-dependent regularization technique that relies on the
principles of multi-task learning. Specifically, we propose an
autoencoder-based matrix completion model that performs prediction of the
unknown matrix values as a main task, and manifold learning as an auxiliary
task. The latter acts as an inductive bias, leading to solutions that
generalize better. The proposed model outperforms the existing
autoencoder-based models designed for matrix completion, achieving high
reconstruction accuracy in well-known datasets.Comment: 5 pages, Eusipco 201
Denoising Gravitational Waves using Deep Learning with Recurrent Denoising Autoencoders
Gravitational wave astronomy is a rapidly growing field of modern
astrophysics, with observations being made frequently by the LIGO detectors.
Gravitational wave signals are often extremely weak and the data from the
detectors, such as LIGO, is contaminated with non-Gaussian and non-stationary
noise, often containing transient disturbances which can obscure real signals.
Traditional denoising methods, such as principal component analysis and
dictionary learning, are not optimal for dealing with this non-Gaussian noise,
especially for low signal-to-noise ratio gravitational wave signals.
Furthermore, these methods are computationally expensive on large datasets. To
overcome these issues, we apply state-of-the-art signal processing techniques,
based on recent groundbreaking advancements in deep learning, to denoise
gravitational wave signals embedded either in Gaussian noise or in real LIGO
noise. We introduce SMTDAE, a Staired Multi-Timestep Denoising Autoencoder,
based on sequence-to-sequence bi-directional Long-Short-Term-Memory recurrent
neural networks. We demonstrate the advantages of using our unsupervised deep
learning approach and show that, after training only using simulated Gaussian
noise, SMTDAE achieves superior recovery performance for gravitational wave
signals embedded in real non-Gaussian LIGO noise.Comment: 5 pages, 2 figure
End-to-End Learning for Structured Prediction Energy Networks
Structured Prediction Energy Networks (SPENs) are a simple, yet expressive
family of structured prediction models (Belanger and McCallum, 2016). An energy
function over candidate structured outputs is given by a deep network, and
predictions are formed by gradient-based optimization. This paper presents
end-to-end learning for SPENs, where the energy function is discriminatively
trained by back-propagating through gradient-based prediction. In our
experience, the approach is substantially more accurate than the structured SVM
method of Belanger and McCallum (2016), as it allows us to use more
sophisticated non-convex energies. We provide a collection of techniques for
improving the speed, accuracy, and memory requirements of end-to-end SPENs, and
demonstrate the power of our method on 7-Scenes image denoising and CoNLL-2005
semantic role labeling tasks. In both, inexact minimization of non-convex SPEN
energies is superior to baseline methods that use simplistic energy functions
that can be minimized exactly.Comment: ICML 201
Multigrid Backprojection Super-Resolution and Deep Filter Visualization
We introduce a novel deep-learning architecture for image upscaling by large
factors (e.g. 4x, 8x) based on examples of pristine high-resolution images. Our
target is to reconstruct high-resolution images from their downscale versions.
The proposed system performs a multi-level progressive upscaling, starting from
small factors (2x) and updating for higher factors (4x and 8x). The system is
recursive as it repeats the same procedure at each level. It is also residual
since we use the network to update the outputs of a classic upscaler. The
network residuals are improved by Iterative Back-Projections (IBP) computed in
the features of a convolutional network. To work in multiple levels we extend
the standard back-projection algorithm using a recursion analogous to
Multi-Grid algorithms commonly used as solvers of large systems of linear
equations. We finally show how the network can be interpreted as a standard
upsampling-and-filter upscaler with a space-variant filter that adapts to the
geometry. This approach allows us to visualize how the network learns to
upscale. Finally, our system reaches state of the art quality for models with
relatively few number of parameters.Comment: Spotlight paper in the Thirty-Third AAAI Conference on Artificial
Intelligence (AAAI-19
Generalized K-fan Multimodal Deep Model with Shared Representations
Multimodal learning with deep Boltzmann machines (DBMs) is an generative
approach to fuse multimodal inputs, and can learn the shared representation via
Contrastive Divergence (CD) for classification and information retrieval tasks.
However, it is a 2-fan DBM model, and cannot effectively handle multiple
prediction tasks. Moreover, this model cannot recover the hidden
representations well by sampling from the conditional distribution when more
than one modalities are missing. In this paper, we propose a K-fan deep
structure model, which can handle the multi-input and muti-output learning
problems effectively. In particular, the deep structure has K-branch for
different inputs where each branch can be composed of a multi-layer deep model,
and a shared representation is learned in an discriminative manner to tackle
multimodal tasks. Given the deep structure, we propose two objective functions
to handle two multi-input and multi-output tasks: joint visual restoration and
labeling, and the multi-view multi-calss object recognition tasks. To estimate
the model parameters, we initialize the deep model parameters with CD to
maximize the joint distribution, and then we use backpropagation to update the
model according to specific objective function. The experimental results
demonstrate that the model can effectively leverages multi-source information
and predict multiple tasks well over competitive baselines.Comment: 11 pages, 5 figure
Rapid Feature Learning with Stacked Linear Denoisers
We investigate unsupervised pre-training of deep architectures as feature
generators for "shallow" classifiers. Stacked Denoising Autoencoders (SdA),
when used as feature pre-processing tools for SVM classification, can lead to
significant improvements in accuracy - however, at the price of a substantial
increase in computational cost. In this paper we create a simple algorithm
which mimics the layer by layer training of SdAs. However, in contrast to SdAs,
our algorithm requires no training through gradient descent as the parameters
can be computed in closed-form. It can be implemented in less than 20 lines of
MATLABTMand reduces the computation time from several hours to mere seconds. We
show that our feature transformation reliably improves the results of SVM
classification significantly on all our data sets - often outperforming SdAs
and even deep neural networks in three out of four deep learning benchmarks.Comment: 10 page
- …