3,725 research outputs found
Deep clustering: Discriminative embeddings for segmentation and separation
We address the problem of acoustic source separation in a deep learning
framework we call "deep clustering." Rather than directly estimating signals or
masking functions, we train a deep network to produce spectrogram embeddings
that are discriminative for partition labels given in training data. Previous
deep network approaches provide great advantages in terms of learning power and
speed, but previously it has been unclear how to use them to separate signals
in a class-independent way. In contrast, spectral clustering approaches are
flexible with respect to the classes and number of items to be segmented, but
it has been unclear how to leverage the learning power and speed of deep
networks. To obtain the best of both worlds, we use an objective function that
to train embeddings that yield a low-rank approximation to an ideal pairwise
affinity matrix, in a class-independent way. This avoids the high cost of
spectral factorization and instead produces compact clusters that are amenable
to simple clustering methods. The segmentations are therefore implicitly
encoded in the embeddings, and can be "decoded" by clustering. Preliminary
experiments show that the proposed method can separate speech: when trained on
spectrogram features containing mixtures of two speakers, and tested on
mixtures of a held-out set of speakers, it can infer masking functions that
improve signal quality by around 6dB. We show that the model can generalize to
three-speaker mixtures despite training only on two-speaker mixtures. The
framework can be used without class labels, and therefore has the potential to
be trained on a diverse set of sound types, and to generalize to novel sources.
We hope that future work will lead to segmentation of arbitrary sounds, with
extensions to microphone array methods as well as image segmentation and other
domains.Comment: Originally submitted on June 5, 201
Cortical spatio-temporal dimensionality reduction for visual grouping
The visual systems of many mammals, including humans, is able to integrate
the geometric information of visual stimuli and to perform cognitive tasks
already at the first stages of the cortical processing. This is thought to be
the result of a combination of mechanisms, which include feature extraction at
single cell level and geometric processing by means of cells connectivity. We
present a geometric model of such connectivities in the space of detected
features associated to spatio-temporal visual stimuli, and show how they can be
used to obtain low-level object segmentation. The main idea is that of defining
a spectral clustering procedure with anisotropic affinities over datasets
consisting of embeddings of the visual stimuli into higher dimensional spaces.
Neural plausibility of the proposed arguments will be discussed
Joint segmentation of color and depth data based on splitting and merging driven by surface fitting
This paper proposes a segmentation scheme based on the joint usage of color and depth data together with a 3D surface estimation scheme. Firstly a set of multi-dimensional vectors is built from color, geometry and surface orientation information. Normalized cuts spectral clustering is then applied in order to recursively segment the scene in two parts thus obtaining an over-segmentation. This procedure is followed by a recursive merging stage where close segments belonging to the same object are joined together. At each step of both procedures a NURBS model is fitted on the computed segments and the accuracy of the fitting is used as a measure of the plausibility that a segment represents a single surface or object. By comparing the accuracy to the one at the previous step, it is possible to determine if each splitting or merging operation leads to a better scene representation and consequently whether to perform it or not. Experimental results show how the proposed method provides an accurate and reliable segmentation
- …