12 research outputs found
Kernel Spectral Curvature Clustering (KSCC)
Multi-manifold modeling is increasingly used in segmentation and data
representation tasks in computer vision and related fields. While the general
problem, modeling data by mixtures of manifolds, is very challenging, several
approaches exist for modeling data by mixtures of affine subspaces (which is
often referred to as hybrid linear modeling). We translate some important
instances of multi-manifold modeling to hybrid linear modeling in embedded
spaces, without explicitly performing the embedding but applying the kernel
trick. The resulting algorithm, Kernel Spectral Curvature Clustering, uses
kernels at two levels - both as an implicit embedding method to linearize
nonflat manifolds and as a principled method to convert a multiway affinity
problem into a spectral clustering one. We demonstrate the effectiveness of the
method by comparing it with other state-of-the-art methods on both synthetic
data and a real-world problem of segmenting multiple motions from two
perspective camera views.Comment: accepted to 2009 ICCV Workshop on Dynamical Visio
Neural Collaborative Subspace Clustering
We introduce the Neural Collaborative Subspace Clustering, a neural model
that discovers clusters of data points drawn from a union of low-dimensional
subspaces. In contrast to previous attempts, our model runs without the aid of
spectral clustering. This makes our algorithm one of the kinds that can
gracefully scale to large datasets. At its heart, our neural model benefits
from a classifier which determines whether a pair of points lies on the same
subspace or not. Essential to our model is the construction of two affinity
matrices, one from the classifier and the other from a notion of subspace
self-expressiveness, to supervise training in a collaborative scheme. We
thoroughly assess and contrast the performance of our model against various
state-of-the-art clustering algorithms including deep subspace-based ones.Comment: Accepted to ICML 201
Nearness to Local Subspace Algorithm for Subspace and Motion Segmentation
There is a growing interest in computer science, engineering, and mathematics
for modeling signals in terms of union of subspaces and manifolds. Subspace
segmentation and clustering of high dimensional data drawn from a union of
subspaces are especially important with many practical applications in computer
vision, image and signal processing, communications, and information theory.
This paper presents a clustering algorithm for high dimensional data that comes
from a union of lower dimensional subspaces of equal and known dimensions. Such
cases occur in many data clustering problems, such as motion segmentation and
face recognition. The algorithm is reliable in the presence of noise, and
applied to the Hopkins 155 Dataset, it generates the best results to date for
motion segmentation. The two motion, three motion, and overall segmentation
rates for the video sequences are 99.43%, 98.69%, and 99.24%, respectively
Graphs decomposition using modified spectral clustering method
Among a large number of tasks on graphs, studies related to the placement of objects
with the aim of increasing the information content of complex multi-parameter systems find
wide practical application (for example, in transport and computer networks, piping systems, in
image processing). Despite years of research, accurate and efficient algorithms cannot be found
for placement problems. It is proposed to consider the solution of the allocation problem in the
context of decomposition of the initial network into k regions, in each of which a vertex with
some centrality property is searched. This article provides an analysis of sources for solving the
problem of placement in graphs, as well as methods of decomposition of graph structures.
Following the main provisions of the theory of spectral clustering, the disadvantages of the
splitting applied criteria Rcut and Ncut are indicated. It is shown that the application of the
distance minimization criterion Dcut proposed in this paper allows to obtain high results in the
decomposition of the graph. The obtained results are based on the examples of searching for
sensor placement vertices in the known ZJ and D-Тown networks of the EPANET hydraulic
modeling system
Subspace Segmentation And High-Dimensional Data Analysis
This thesis developed theory and associated algorithms to solve subspace segmentation problem. Given a set of data W={w_1,...,w_N} in R^D that comes from a union of subspaces, we focused on determining a nonlinear model of the form U={S_i}_{i in I}, where S_i is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our first approach is based on the binary reduced row echelon form of data matrix. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace S_i. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise. Our second approach is based on nearness to local subspaces approach and it can handle noise effectively, but it works only in special cases of the general subspace segmentation problem (i.e., subspaces of equal and known dimensions). Our approach is based on the computation of a binary similarity matrix for the data points. A local subspace is first estimated for each data point. Then, a distance matrix is generated by computing the distances between the local subspaces and points. The distance matrix is converted to the similarity matrix by applying a data-driven threshold. The problem is then transformed to segmentation of subspaces of dimension 1 instead of subspaces of dimension d. The algorithm was applied to the Hopkins 155 Dataset and generated the best results to date
Novel methods for Intrinsic dimension estimation and manifold learning
One of the most challenging problems in modern science is how to deal with
the huge amount of data that today's technologies provide. Several diculties may
arise. For instance, the number of samples may be too big and the stream of
incoming data may be faster than the algorithm needed to process them. Another
common problem is that when data dimension grows also the volume of the space
does, leading to a sparsication of the available data. This may cause problems
in the statistical analysis since the data needed to support our conclusion often
grows exponentially with the dimension. This problem is commonly referred to
as the Curse of Dimensionality and it is one of the reasons why high dimensional
data can not be analyzed eciently with traditional methods. Classical methods
for dimensionality reduction, like principal component analysis and factor analysis,
may fail due to a nonlinear structure of the data. In recent years several methods
for nonlinear dimensionality reduction have been proposed. A general way to model
high dimensional data set is to represent the observations as noisy samples drawn
from a probability distribution mu in the real coordinate space of D dimensions. It has been observed that the essential
support of mu can be often well approximated by low dimensional sets. These sets
can be assumed to be low dimensional manifolds embedded in the ambient dimension
D. A manifold is a topologial space which globally may not be Euclidean but in
a small neighbor of each point behaves like an Euclidean space. In this setting we
call intrinsic dimension the dimension of the manifold, which is usually much lower
than the ambient dimension D. Roughly speaking, the intrinsic dimension of a data set can be described as the
minimum number of variables needed to represent the data without signicant loss
of information. In this work we propose dierent methods aimed at estimate the
intrinsic dimension. The rst method we present models the neighbors of each point
as stochastic processes, in such a way that a closed form likelihood function can
be written. This leads to a closed form maximum likelihood estimator (MLE) for
the intrinsic dimension, which has all the good features that a MLE can have. The
second method is based on a multiscale singular value decomposition (MSVD) of the
data. This method performs singular value decomposition (SVD) on neighbors of
increasing size and nd an estimate for the intrinsic dimension studying the behavior of the singular values as the radius of the neighbor increases. We also introduce
an algorithm to estimate the model parameters when the data are assumed to be
sampled around an unknown number of planes with dierent intrinsic dimensions,
embedded in a high dimensional space. This kind of models have many applications
in computer vision and patter recognition, where the data can be described by multiple linear structures or need to be clusterized into groups that can be represented
by low dimensional hyperplanes. The algorithm relies on both MSVD and spectral
clustering, and it is able to estimate the number of planes, their dimension as well
as their arrangement in the ambient space. Finally, we propose a novel method for
manifold reconstruction based on a multiscale approach, which approximates the
manifold from coarse to ne scales with increasing precision. The basic idea is to
produce, at a generic scale j, a piecewise linear approximation of the manifold using
a collection of low dimensional planes and use those planes to create clusters for
the data. At scale j + 1, each cluster is independently approximated by another
collection of low dimensional planes.The process is iterated until the desired precision
is achieved. This algorithm is fast because it is highly parallelizable and its
computational time is independent from the sample size. Moreover this method automatically
constructs a tree structure for the data. This feature can be particularly
useful in applications which requires an a priori tree data structure. The aim of the
collection of methods proposed in this work is to provide algorithms to learn and
estimate the underlying structure of high dimensional dataset
Unsupervised Learning from Shollow to Deep
Machine learning plays a pivotal role in most state-of-the-art systems in many application research domains. With the rising of deep learning, massive labeled data become the solution of feature learning, which enables the model to learn automatically. Unfortunately, the trained deep learning model is hard to adapt to other datasets without fine-tuning, and the applicability of machine learning methods is limited by the amount of available labeled data. Therefore, the aim of this thesis is to alleviate the limitations of supervised learning by exploring algorithms to learn good internal representations, and invariant feature hierarchies from unlabelled data.
Firstly, we extend the traditional dictionary learning and sparse coding algorithms onto hierarchical image representations in a principled way. To achieve dictionary atoms capture additional information from extended receptive fields and attain improved descriptive capacity, we present a two-pass multi-resolution cascade framework for dictionary learning and sparse coding. This cascade method allows collaborative reconstructions at different resolutions using only the same dimensional dictionary atoms. The jointly learned dictionary comprises atoms that adapt to the information available at the coarsest layer, where the support of atoms reaches a maximum range, and the residual images, where the supplementary details refine progressively a reconstruction objective. Our method generates flexible and accurate representations using only a small number of coefficients, and is efficient in computation.
In the following work, we propose to incorporate the traditional self-expressiveness property into deep learning to explore better representation for subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the ``self-expressiveness'' property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures.
However, Subspace clustering algorithms are notorious for their scalability issues because building and processing large affinity matrices are demanding. We propose two methods to tackle this problem. One method is based on -Subspace Clustering, where we introduce a method that simultaneously learns an embedding space along subspaces within it to minimize a notion of reconstruction error, thus addressing the problem of subspace clustering in an end-to-end learning paradigm. This in turn frees us from the need of having an affinity matrix to perform clustering. The other way starts from using a feed forward network to replace the spectral clustering and learn the affinities of each data from "self-expressive" layer. We introduce the Neural Collaborative Subspace Clustering, where it benefits from a classifier which determines whether a pair of points lies on the same subspace under supervision of "self-expressive" layer. Essential to our model is the construction of two affinity matrices, one from the classifier and the other from a notion of subspace self-expressiveness, to supervise training in a collaborative scheme.
In summary, we make constributions on how to perform the unsupervised learning in several tasks in this thesis. It starts from traditional sparse coding and dictionary learning perspective in low-level vision. Then, we exploit how to incorporate unsupervised learning in convolutional neural networks without label information and make subspace clustering to large scale dataset. Furthermore, we also extend the clustering on dense prediction task (saliency detection)