3,921 research outputs found
Robust Unsupervised Flexible Auto-weighted Local-Coordinate Concept Factorization for Image Clustering
We investigate the high-dimensional data clustering problem by proposing a
novel and unsupervised representation learning model called Robust Flexible
Auto-weighted Local-coordinate Concept Factorization (RFA-LCF). RFA-LCF
integrates the robust flexible CF, robust sparse local-coordinate coding and
the adaptive reconstruction weighting learning into a unified model. The
adaptive weighting is driven by including the joint manifold preserving
constraints on the recovered clean data, basis concepts and new representation.
Specifically, our RFA-LCF uses a L2,1-norm based flexible residue to encode the
mismatch between clean data and its reconstruction, and also applies the robust
adaptive sparse local-coordinate coding to represent the data using a few
nearby basis concepts, which can make the factorization more accurate and
robust to noise. The robust flexible factorization is also performed in the
recovered clean data space for enhancing representations. RFA-LCF also
considers preserving the local manifold structures of clean data space, basis
concept space and the new coordinate space jointly in an adaptive manner way.
Extensive comparisons show that RFA-LCF can deliver enhanced clustering
results.Comment: Accepted at the 44th IEEE International Conference on Acoustics,
Speech, and Signal Processing(ICASSP 2019
Global and Local Structure Preserving Sparse Subspace Learning: An Iterative Approach to Unsupervised Feature Selection
As we aim at alleviating the curse of high-dimensionality, subspace learning
is becoming more popular. Existing approaches use either information about
global or local structure of the data, and few studies simultaneously focus on
global and local structures as the both of them contain important information.
In this paper, we propose a global and local structure preserving sparse
subspace learning (GLoSS) model for unsupervised feature selection. The model
can simultaneously realize feature selection and subspace learning. In
addition, we develop a greedy algorithm to establish a generic combinatorial
model, and an iterative strategy based on an accelerated block coordinate
descent is used to solve the GLoSS problem. We also provide whole iterate
sequence convergence analysis of the proposed iterative algorithm. Extensive
experiments are conducted on real-world datasets to show the superiority of the
proposed approach over several state-of-the-art unsupervised feature selection
approaches.Comment: 32 page, 6 figures and 60 reference
Learning A Task-Specific Deep Architecture For Clustering
While sparse coding-based clustering methods have shown to be successful,
their bottlenecks in both efficiency and scalability limit the practical usage.
In recent years, deep learning has been proved to be a highly effective,
efficient and scalable feature learning tool. In this paper, we propose to
emulate the sparse coding-based clustering pipeline in the context of deep
learning, leading to a carefully crafted deep model benefiting from both. A
feed-forward network structure, named TAGnet, is constructed based on a
graph-regularized sparse coding algorithm. It is then trained with
task-specific loss functions from end to end. We discover that connecting deep
learning to sparse coding benefits not only the model performance, but also its
initialization and interpretation. Moreover, by introducing auxiliary
clustering tasks to the intermediate feature hierarchy, we formulate DTAGnet
and obtain a further performance boost. Extensive experiments demonstrate that
the proposed model gains remarkable margins over several state-of-the-art
methods
Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization
Most traditional video summarization methods are designed to generate
effective summaries for single-view videos, and thus they cannot fully exploit
the complicated intra and inter-view correlations in summarizing multi-view
videos in a camera network. In this paper, with the aim of summarizing
multi-view videos, we introduce a novel unsupervised framework via joint
embedding and sparse representative selection. The objective function is
two-fold. The first is to capture the multi-view correlations via an embedding,
which helps in extracting a diverse set of representatives. The second is to
use a `2;1- norm to model the sparsity while selecting representative shots for
the summary. We propose to jointly optimize both of the objectives, such that
embedding can not only characterize the correlations, but also indicate the
requirements of sparse representative selection. We present an efficient
alternating algorithm based on half-quadratic minimization to solve the
proposed non-smooth and non-convex objective with convergence analysis. A key
advantage of the proposed approach with respect to the state-of-the-art is that
it can summarize multi-view videos without assuming any prior
correspondences/alignment between them, e.g., uncalibrated camera networks.
Rigorous experiments on several multi-view datasets demonstrate that our
approach clearly outperforms the state-of-the-art methods.Comment: IEEE Trans. on Multimedia, 2017 (In Press
Feature Selection: A Data Perspective
Feature selection, as a data preprocessing strategy, has been proven to be
effective and efficient in preparing data (especially high-dimensional data)
for various data mining and machine learning problems. The objectives of
feature selection include: building simpler and more comprehensible models,
improving data mining performance, and preparing clean, understandable data.
The recent proliferation of big data has presented some substantial challenges
and opportunities to feature selection. In this survey, we provide a
comprehensive and structured overview of recent advances in feature selection
research. Motivated by current challenges and opportunities in the era of big
data, we revisit feature selection research from a data perspective and review
representative feature selection algorithms for conventional data, structured
data, heterogeneous data and streaming data. Methodologically, to emphasize the
differences and similarities of most existing feature selection algorithms for
conventional data, we categorize them into four main groups: similarity based,
information theoretical based, sparse learning based and statistical based
methods. To facilitate and promote the research in this community, we also
present an open-source feature selection repository that consists of most of
the popular feature selection algorithms
(\url{http://featureselection.asu.edu/}). Also, we use it as an example to show
how to evaluate feature selection algorithms. At the end of the survey, we
present a discussion about some open problems and challenges that require more
attention in future research
Are You Imitating Me? Unsupervised Sparse Modeling for Group Activity Analysis from a Single Video
A framework for unsupervised group activity analysis from a single video is
here presented. Our working hypothesis is that human actions lie on a union of
low-dimensional subspaces, and thus can be efficiently modeled as sparse linear
combinations of atoms from a learned dictionary representing the action's
primitives. Contrary to prior art, and with the primary goal of spatio-temporal
action grouping, in this work only one single video segment is available for
both unsupervised learning and analysis without any prior training information.
After extracting simple features at a single spatio-temporal scale, we learn a
dictionary for each individual in the video during each short time lapse. These
dictionaries allow us to compare the individuals' actions by producing an
affinity matrix which contains sufficient discriminative information about the
actions in the scene leading to grouping with simple and efficient tools. With
diverse publicly available real videos, we demonstrate the effectiveness of the
proposed framework and its robustness to cluttered backgrounds, changes of
human appearance, and action variability
Unsupervised Multi-modal Hashing for Cross-modal retrieval
With the advantage of low storage cost and high efficiency, hashing learning
has received much attention in the domain of Big Data. In this paper, we
propose a novel unsupervised hashing learning method to cope with this open
problem to directly preserve the manifold structure by hashing. To address this
problem, both the semantic correlation in textual space and the locally
geometric structure in the visual space are explored simultaneously in our
framework. Besides, the `2;1-norm constraint is imposed on the projection
matrices to learn the discriminative hash function for each modality. Extensive
experiments are performed to evaluate the proposed method on the three publicly
available datasets and the experimental results show that our method can
achieve superior performance over the state-of-the-art methods.Comment: 4 pages, 4 figure
Deep Sparse Subspace Clustering
In this paper, we present a deep extension of Sparse Subspace Clustering,
termed Deep Sparse Subspace Clustering (DSSC). Regularized by the unit sphere
distribution assumption for the learned deep features, DSSC can infer a new
data affinity matrix by simultaneously satisfying the sparsity principle of SSC
and the nonlinearity given by neural networks. One of the appealing advantages
brought by DSSC is: when original real-world data do not meet the
class-specific linear subspace distribution assumption, DSSC can employ neural
networks to make the assumption valid with its hierarchical nonlinear
transformations. To the best of our knowledge, this is among the first deep
learning based subspace clustering methods. Extensive experiments are conducted
on four real-world datasets to show the proposed DSSC is significantly superior
to 12 existing methods for subspace clustering.Comment: The initial version is completed at the beginning of 201
Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval
Sketch-based image retrieval (SBIR) is challenging due to the inherent
domain-gap between sketch and photo. Compared with pixel-perfect depictions of
photos, sketches are iconic renderings of the real world with highly abstract.
Therefore, matching sketch and photo directly using low-level visual clues are
unsufficient, since a common low-level subspace that traverses semantically
across the two modalities is non-trivial to establish. Most existing SBIR
studies do not directly tackle this cross-modal problem. This naturally
motivates us to explore the effectiveness of cross-modal retrieval methods in
SBIR, which have been applied in the image-text matching successfully. In this
paper, we introduce and compare a series of state-of-the-art cross-modal
subspace learning methods and benchmark them on two recently released
fine-grained SBIR datasets. Through thorough examination of the experimental
results, we have demonstrated that the subspace learning can effectively model
the sketch-photo domain-gap. In addition we draw a few key insights to drive
future research.Comment: Accepted by Neurocomputin
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
- …