2,852 research outputs found
Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models
Conventional deep neural networks (DNN) for speech acoustic modeling rely on
Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary
class labels as the targets for DNN training. Subword classes in speech
recognition systems correspond to context-dependent tied states or senones. The
present work addresses some limitations of GMM-HMM senone alignments for DNN
training. We hypothesize that the senone probabilities obtained from a DNN
trained with binary labels can provide more accurate targets to learn better
acoustic models. However, DNN outputs bear inaccuracies which are exhibited as
high dimensional unstructured noise, whereas the informative components are
structured and low-dimensional. We exploit principle component analysis (PCA)
and sparse coding to characterize the senone subspaces. Enhanced probabilities
obtained from low-rank and sparse reconstructions are used as soft-targets for
DNN acoustic modeling, that also enables training with untranscribed data.
Experiments conducted on AMI corpus shows 4.6% relative reduction in word error
rate
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
Low Rank and Sparsity Analysis Applied to Speech Enhancement via Online Estimated Dictionary
In this letter, we propose an online estimated local dictionary based single-channel speech enhancement algorithm, which focuses on low-rank and sparse matrix decomposition. In the proposed algorithm, a noisy speech spectrogram can be decomposed into low-rank background noise components and an activation of the online speech dictionary, on which both low-rank and sparsity constraints are imposed. This decomposition takes the advantage of local estimated exemplar’s high expressiveness on speech components and also accommodates nonstationary background noise. The local dictionary can be obtained through estimating the speech presence probability (SPP) by applying expectation–maximal algorithm, in which a generalized Gamma prior for speech magnitude spectrum is used. The proposed algorithm is evaluated using signal-to-distortion ratio, and perceptual evaluation of speech quality. The results show that the proposed algorithm achieves significant improvements at various SNRs when compared to four other speech enhancement algorithms, including improved Karhunen–Loeve transform approach, SPP-based MMSE, nonnegative matrix factorization-based robust principal component analysis (RPCA), and RPCA
A Unified Framework for Sparse Non-Negative Least Squares using Multiplicative Updates and the Non-Negative Matrix Factorization Problem
We study the sparse non-negative least squares (S-NNLS) problem. S-NNLS
occurs naturally in a wide variety of applications where an unknown,
non-negative quantity must be recovered from linear measurements. We present a
unified framework for S-NNLS based on a rectified power exponential scale
mixture prior on the sparse codes. We show that the proposed framework
encompasses a large class of S-NNLS algorithms and provide a computationally
efficient inference procedure based on multiplicative update rules. Such update
rules are convenient for solving large sets of S-NNLS problems simultaneously,
which is required in contexts like sparse non-negative matrix factorization
(S-NMF). We provide theoretical justification for the proposed approach by
showing that the local minima of the objective function being optimized are
sparse and the S-NNLS algorithms presented are guaranteed to converge to a set
of stationary points of the objective function. We then extend our framework to
S-NMF, showing that our framework leads to many well known S-NMF algorithms
under specific choices of prior and providing a guarantee that a popular
subclass of the proposed algorithms converges to a set of stationary points of
the objective function. Finally, we study the performance of the proposed
approaches on synthetic and real-world data.Comment: To appear in Signal Processin
LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images
Deep learning based fusion methods have been achieving promising performance
in image fusion tasks. This is attributed to the network architecture that
plays a very important role in the fusion process. However, in general, it is
hard to specify a good fusion architecture, and consequently, the design of
fusion networks is still a black art, rather than science. To address this
problem, we formulate the fusion task mathematically, and establish a
connection between its optimal solution and the network architecture that can
implement it. This approach leads to a novel method proposed in the paper of
constructing a lightweight fusion network. It avoids the time-consuming
empirical network design by a trial-and-test strategy. In particular we adopt a
learnable representation approach to the fusion task, in which the construction
of the fusion network architecture is guided by the optimisation algorithm
producing the learnable model. The low-rank representation (LRR) objective is
the foundation of our learnable model. The matrix multiplications, which are at
the heart of the solution are transformed into convolutional operations, and
the iterative process of optimisation is replaced by a special feed-forward
network. Based on this novel network architecture, an end-to-end lightweight
fusion network is constructed to fuse infrared and visible light images. Its
successful training is facilitated by a detail-to-semantic information loss
function proposed to preserve the image details and to enhance the salient
features of the source images. Our experiments show that the proposed fusion
network exhibits better fusion performance than the state-of-the-art fusion
methods on public datasets. Interestingly, our network requires a fewer
training parameters than other existing methods.Comment: 14 pages, 15 figures, 8 table
- …