49,840 research outputs found
Constructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive Features
This paper aims at constructing a good graph for discovering intrinsic data
structures in a semi-supervised learning setting. Firstly, we propose to build
a non-negative low-rank and sparse (referred to as NNLRS) graph for the given
data representation. Specifically, the weights of edges in the graph are
obtained by seeking a nonnegative low-rank and sparse matrix that represents
each data sample as a linear combination of others. The so-obtained NNLRS-graph
can capture both the global mixture of subspaces structure (by the low
rankness) and the locally linear structure (by the sparseness) of the data,
hence is both generative and discriminative. Secondly, as good features are
extremely important for constructing a good graph, we propose to learn the data
embedding matrix and construct the graph jointly within one framework, which is
termed as NNLRS with embedded features (referred to as NNLRS-EF). Extensive
experiments on three publicly available datasets demonstrate that the proposed
method outperforms the state-of-the-art graph construction method by a large
margin for both semi-supervised classification and discriminative analysis,
which verifies the effectiveness of our proposed method
ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks
Hash codes are efficient data representations for coping with the ever
growing amounts of data. In this paper, we introduce a random forest semantic
hashing scheme that embeds tiny convolutional neural networks (CNN) into
shallow random forests, with near-optimal information-theoretic code
aggregation among trees. We start with a simple hashing scheme, where random
trees in a forest act as hashing functions by setting `1' for the visited tree
leaf, and `0' for the rest. We show that traditional random forests fail to
generate hashes that preserve the underlying similarity between the trees,
rendering the random forests approach to hashing challenging. To address this,
we propose to first randomly group arriving classes at each tree split node
into two groups, obtaining a significantly simplified two-class classification
problem, which can be handled using a light-weight CNN weak learner. Such
random class grouping scheme enables code uniqueness by enforcing each class to
share its code with different classes in different trees. A non-conventional
low-rank loss is further adopted for the CNN weak learners to encourage code
consistency by minimizing intra-class variations and maximizing inter-class
distance for the two random class groups. Finally, we introduce an
information-theoretic approach for aggregating codes of individual trees into a
single hash code, producing a near-optimal unique hash for each class. The
proposed approach significantly outperforms state-of-the-art hashing methods
for image retrieval tasks on large-scale public datasets, while performing at
the level of other state-of-the-art image classification techniques while
utilizing a more compact and efficient scalable representation. This work
proposes a principled and robust procedure to train and deploy in parallel an
ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201
Distributed Low-rank Subspace Segmentation
Vision problems ranging from image clustering to motion segmentation to
semi-supervised learning can naturally be framed as subspace segmentation
problems, in which one aims to recover multiple low-dimensional subspaces from
noisy and corrupted input data. Low-Rank Representation (LRR), a convex
formulation of the subspace segmentation problem, is provably and empirically
accurate on small problems but does not scale to the massive sizes of modern
vision datasets. Moreover, past work aimed at scaling up low-rank matrix
factorization is not applicable to LRR given its non-decomposable constraints.
In this work, we propose a novel divide-and-conquer algorithm for large-scale
subspace segmentation that can cope with LRR's non-decomposable constraints and
maintains LRR's strong recovery guarantees. This has immediate implications for
the scalability of subspace segmentation, which we demonstrate on a benchmark
face recognition dataset and in simulations. We then introduce novel
applications of LRR-based subspace segmentation to large-scale semi-supervised
learning for multimedia event detection, concept detection, and image tagging.
In each case, we obtain state-of-the-art results and order-of-magnitude speed
ups
A Probabilistic Interpretation of Sampling Theory of Graph Signals
We give a probabilistic interpretation of sampling theory of graph signals.
To do this, we first define a generative model for the data using a pairwise
Gaussian random field (GRF) which depends on the graph. We show that, under
certain conditions, reconstructing a graph signal from a subset of its samples
by least squares is equivalent to performing MAP inference on an approximation
of this GRF which has a low rank covariance matrix. We then show that a
sampling set of given size with the largest associated cut-off frequency, which
is optimal from a sampling theoretic point of view, minimizes the worst case
predictive covariance of the MAP estimate on the GRF. This interpretation also
gives an intuitive explanation for the superior performance of the sampling
theoretic approach to active semi-supervised classification.Comment: 5 pages, 2 figures, To appear in International Conference on
Acoustics, Speech, and Signal Processing (ICASSP) 201
- …