1,663 research outputs found
Single channel speech music separation using nonnegative matrix factorization and spectral masks
A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with spectral masks is proposed in this work. The proposed algorithm uses training data of speech and music signals with nonnegative matrix factorization followed by masking to separate the mixed signal. In the training stage, NMF uses the training data to train a set of basis vectors for each source. These bases are trained using NMF in the magnitude spectrum domain. After observing the mixed signal, NMF is used to decompose its magnitude spectra into a linear combination of the trained bases for both sources. The decomposition results are used to build a mask, which explains the contribution of each source in the mixed signal. Experimental results show that using masks after NMF improves the separation process even when calculating NMF with fewer iterations, which yields a faster separation process
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
Deep clustering: Discriminative embeddings for segmentation and separation
We address the problem of acoustic source separation in a deep learning
framework we call "deep clustering." Rather than directly estimating signals or
masking functions, we train a deep network to produce spectrogram embeddings
that are discriminative for partition labels given in training data. Previous
deep network approaches provide great advantages in terms of learning power and
speed, but previously it has been unclear how to use them to separate signals
in a class-independent way. In contrast, spectral clustering approaches are
flexible with respect to the classes and number of items to be segmented, but
it has been unclear how to leverage the learning power and speed of deep
networks. To obtain the best of both worlds, we use an objective function that
to train embeddings that yield a low-rank approximation to an ideal pairwise
affinity matrix, in a class-independent way. This avoids the high cost of
spectral factorization and instead produces compact clusters that are amenable
to simple clustering methods. The segmentations are therefore implicitly
encoded in the embeddings, and can be "decoded" by clustering. Preliminary
experiments show that the proposed method can separate speech: when trained on
spectrogram features containing mixtures of two speakers, and tested on
mixtures of a held-out set of speakers, it can infer masking functions that
improve signal quality by around 6dB. We show that the model can generalize to
three-speaker mixtures despite training only on two-speaker mixtures. The
framework can be used without class labels, and therefore has the potential to
be trained on a diverse set of sound types, and to generalize to novel sources.
We hope that future work will lead to segmentation of arbitrary sounds, with
extensions to microphone array methods as well as image segmentation and other
domains.Comment: Originally submitted on June 5, 201
Rotationally-invariant mapping of scalar and orientational metrics of neuronal microstructure with diffusion MRI
We develop a general analytical and numerical framework for estimating intra-
and extra-neurite water fractions and diffusion coefficients, as well as
neurite orientational dispersion, in each imaging voxel. By employing a set of
rotational invariants and their expansion in the powers of diffusion weighting,
we analytically uncover the nontrivial topology of the parameter estimation
landscape, showing that multiple branches of parameters describe the
measurement almost equally well, with only one of them corresponding to the
biophysical reality. A comprehensive acquisition shows that the branch choice
varies across the brain. Our framework reveals hidden degeneracies in MRI
parameter estimation for neuronal tissue, provides microstructural and
orientational maps in the whole brain without constraints or priors, and
connects modern biophysical modeling with clinical MRI.Comment: 25 pages, 12 figures, elsarticle two-colum
- …