5,774 research outputs found
Semi-supervised Spectral Clustering for Classification
We propose a Classification Via Clustering (CVC) algorithm which enables
existing clustering methods to be efficiently employed in classification
problems. In CVC, training and test data are co-clustered and class-cluster
distributions are used to find the label of the test data. To determine an
efficient number of clusters, a Semi-supervised Hierarchical Clustering (SHC)
algorithm is proposed. Clusters are obtained by hierarchically applying two-way
NCut by using signs of the Fiedler vector of the normalized graph Laplacian. To
this end, a Direct Fiedler Vector Computation algorithm is proposed. The graph
cut is based on the data structure and does not consider labels. Labels are
used only to define the stopping criterion for graph cut. We propose clustering
to be performed on the Grassmannian manifolds facilitating the formation of
spectral ensembles. The proposed algorithm outperformed state-of-the-art
image-set classification algorithms on five standard datasets
Spatial Context based Angular Information Preserving Projection for Hyperspectral Image Classification
Dimensionality reduction is a crucial preprocessing for hyperspectral data
analysis - finding an appropriate subspace is often required for subsequent
image classification. In recent work, we proposed supervised angular
information based dimensionality reduction methods to find effective subspaces.
Since unlabeled data are often more readily available compared to labeled data,
we propose an unsupervised projection that finds a lower dimensional subspace
where local angular information is preserved. To exploit spatial information
from the hyperspectral images, we further extend our unsupervised projection to
incorporate spatial contextual information around each pixel in the image.
Additionally, we also propose a sparse representation based classifier which is
optimized to exploit spatial information during classification - we hence
assert that our proposed projection is particularly suitable for classifiers
where local similarity and spatial context are both important. Experimental
results with two real-world hyperspectral datasets demonstrate that our
proposed methods provide a robust classification performance
Perceptual Visual Interactive Learning
Supervised learning methods are widely used in machine learning. However, the
lack of labels in existing data limits the application of these technologies.
Visual interactive learning (VIL) compared with computers can avoid semantic
gap, and solve the labeling problem of small label quantity (SLQ) samples in a
groundbreaking way. In order to fully understand the importance of VIL to the
interaction process, we re-summarize the interactive learning related
algorithms (e.g. clustering, classification, retrieval etc.) from the
perspective of VIL. Note that, perception and cognition are two main visual
processes of VIL. On this basis, we propose a perceptual visual interactive
learning (PVIL) framework, which adopts gestalt principle to design interaction
strategy and multi-dimensionality reduction (MDR) to optimize the process of
visualization. The advantage of PVIL framework is that it combines computer's
sensitivity of detailed features and human's overall understanding of global
tasks. Experimental results validate that the framework is superior to
traditional computer labeling methods (such as label propagation) in both
accuracy and efficiency, which achieves significant classification results on
dense distribution and sparse classes dataset
Large Margin Low Rank Tensor Analysis
Other than vector representations, the direct objects of human cognition are
generally high-order tensors, such as 2D images and 3D textures. From this
fact, two interesting questions naturally arise: How does the human brain
represent these tensor perceptions in a "manifold" way, and how can they be
recognized on the "manifold"? In this paper, we present a supervised model to
learn the intrinsic structure of the tensors embedded in a high dimensional
Euclidean space. With the fixed point continuation procedures, our model
automatically and jointly discovers the optimal dimensionality and the
representations of the low dimensional embeddings. This makes it an effective
simulation of the cognitive process of human brain. Furthermore, the
generalization of our model based on similarity between the learned low
dimensional embeddings can be viewed as counterpart of recognition of human
brain. Experiments on applications for object recognition and face recognition
demonstrate the superiority of our proposed model over state-of-the-art
approaches.Comment: 30 page
A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning
We present a general framework of semi-supervised dimensionality reduction
for manifold learning which naturally generalizes existing supervised and
unsupervised learning frameworks which apply the spectral decomposition.
Algorithms derived under our framework are able to employ both labeled and
unlabeled examples and are able to handle complex problems where data form
separate clusters of manifolds. Our framework offers simple views, explains
relationships among existing frameworks and provides further extensions which
can improve existing algorithms. Furthermore, a new semi-supervised
kernelization framework called ``KPCA trick'' is proposed to handle non-linear
problems.Comment: 22 pages, 9 figure
Machine learning based hyperspectral image analysis: A survey
Hyperspectral sensors enable the study of the chemical properties of scene
materials remotely for the purpose of identification, detection, and chemical
composition analysis of objects in the environment. Hence, hyperspectral images
captured from earth observing satellites and aircraft have been increasingly
important in agriculture, environmental monitoring, urban planning, mining, and
defense. Machine learning algorithms due to their outstanding predictive power
have become a key tool for modern hyperspectral image analysis. Therefore, a
solid understanding of machine learning techniques have become essential for
remote sensing researchers and practitioners. This paper reviews and compares
recent machine learning-based hyperspectral image analysis methods published in
literature. We organize the methods by the image analysis task and by the type
of machine learning algorithm, and present a two-way mapping between the image
analysis tasks and the types of machine learning algorithms that can be applied
to them. The paper is comprehensive in coverage of both hyperspectral image
analysis tasks and machine learning algorithms. The image analysis tasks
considered are land cover classification, target detection, unmixing, and
physical parameter estimation. The machine learning algorithms covered are
Gaussian models, linear regression, logistic regression, support vector
machines, Gaussian mixture model, latent linear models, sparse linear models,
Gaussian mixture models, ensemble learning, directed graphical models,
undirected graphical models, clustering, Gaussian processes, Dirichlet
processes, and deep learning. We also discuss the open challenges in the field
of hyperspectral image analysis and explore possible future directions
Online Supervised Subspace Tracking
We present a framework for supervised subspace tracking, when there are two
time series and , one being the high-dimensional predictors and the
other being the response variables and the subspace tracking needs to take into
consideration of both sequences. It extends the classic online subspace
tracking work which can be viewed as tracking of only. Our online
sufficient dimensionality reduction (OSDR) is a meta-algorithm that can be
applied to various cases including linear regression, logistic regression,
multiple linear regression, multinomial logistic regression, support vector
machine, the random dot product model and the multi-scale union-of-subspace
model. OSDR reduces data-dimensionality on-the-fly with low-computational
complexity and it can also handle missing data and dynamic data. OSDR uses an
alternating minimization scheme and updates the subspace via gradient descent
on the Grassmannian manifold. The subspace update can be performed efficiently
utilizing the fact that the Grassmannian gradient with respect to the subspace
in many settings is rank-one (or low-rank in certain cases). The optimization
problem for OSDR is non-convex and hard to analyze in general; we provide
convergence analysis of OSDR in a simple linear regression setting. The good
performance of OSDR compared with the conventional unsupervised subspace
tracking are demonstrated via numerical examples on simulated and real data.Comment: Submitted for journal publicatio
A literature survey of matrix methods for data science
Efficient numerical linear algebra is a core ingredient in many applications
across almost all scientific and industrial disciplines. With this survey we
want to illustrate that numerical linear algebra has played and is playing a
crucial role in enabling and improving data science computations with many new
developments being fueled by the availability of data and computing resources.
We highlight the role of various different factorizations and the power of
changing the representation of the data as well as discussing topics such as
randomized algorithms, functions of matrices, and high-dimensional problems. We
briefly touch upon the role of techniques from numerical linear algebra used
within deep learning
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
Pattern analysis often requires a pre-processing stage for extracting or
selecting features in order to help the classification, prediction, or
clustering stage discriminate or represent the data in a better way. The reason
for this requirement is that the raw data are complex and difficult to process
without extracting or selecting appropriate features beforehand. This paper
reviews theory and motivation of different common methods of feature selection
and extraction and introduces some of their applications. Some numerical
implementations are also shown for these methods. Finally, the methods in
feature selection and extraction are compared.Comment: 14 pages, 1 figure, 2 tables, survey (literature review) pape
Improved graph Laplacian via geometric self-consistency
We address the problem of setting the kernel bandwidth used by Manifold
Learning algorithms to construct the graph Laplacian. Exploiting the connection
between manifold geometry, represented by the Riemannian metric, and the
Laplace-Beltrami operator, we set the bandwidth by optimizing the Laplacian's
ability to preserve the geometry of the data. Experiments show that this
principled approach is effective and robust.Comment: 12 page
- …