14,578 research outputs found
Segmentation Rectification for Video Cutout via One-Class Structured Learning
Recent works on interactive video object cutout mainly focus on designing
dynamic foreground-background (FB) classifiers for segmentation propagation.
However, the research on optimally removing errors from the FB classification
is sparse, and the errors often accumulate rapidly, causing significant errors
in the propagated frames. In this work, we take the initial steps to addressing
this problem, and we call this new task \emph{segmentation rectification}. Our
key observation is that the possibly asymmetrically distributed false positive
and false negative errors were handled equally in the conventional methods. We,
alternatively, propose to optimally remove these two types of errors. To this
effect, we propose a novel bilayer Markov Random Field (MRF) model for this new
task. We also adopt the well-established structured learning framework to learn
the optimal model from data. Additionally, we propose a novel one-class
structured SVM (OSSVM) which greatly speeds up the structured learning process.
Our method naturally extends to RGB-D videos as well. Comprehensive experiments
on both RGB and RGB-D data demonstrate that our simple and effective method
significantly outperforms the segmentation propagation methods adopted in the
state-of-the-art video cutout systems, and the results also suggest the
potential usefulness of our method in image cutout system
Visual Tracking via Dynamic Graph Learning
Existing visual tracking methods usually localize a target object with a
bounding box, in which the performance of the foreground object trackers or
detectors is often affected by the inclusion of background clutter. To handle
this problem, we learn a patch-based graph representation for visual tracking.
The tracked object is modeled by with a graph by taking a set of
non-overlapping image patches as nodes, in which the weight of each node
indicates how likely it belongs to the foreground and edges are weighted for
indicating the appearance compatibility of two neighboring nodes. This graph is
dynamically learned and applied in object tracking and model updating. During
the tracking process, the proposed algorithm performs three main steps in each
frame. First, the graph is initialized by assigning binary weights of some
image patches to indicate the object and background patches according to the
predicted bounding box. Second, the graph is optimized to refine the patch
weights by using a novel alternating direction method of multipliers. Third,
the object feature representation is updated by imposing the weights of patches
on the extracted image features. The object location is predicted by maximizing
the classification score in the structured support vector machine. Extensive
experiments show that the proposed tracking algorithm performs well against the
state-of-the-art methods on large-scale benchmark datasets.Comment: Submitted to TPAMI 201
Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
We consider the problem of zero-shot recognition: learning a visual
classifier for a category with zero training examples, just using the word
embedding of the category and its relationship to other categories, which
visual data are provided. The key to dealing with the unfamiliar or novel
category is to transfer knowledge obtained from familiar classes to describe
the unfamiliar class. In this paper, we build upon the recently introduced
Graph Convolutional Network (GCN) and propose an approach that uses both
semantic embeddings and the categorical relationships to predict the
classifiers. Given a learned knowledge graph (KG), our approach takes as input
semantic embeddings for each node (representing visual category). After a
series of graph convolutions, we predict the visual classifier for each
category. During training, the visual classifiers for a few categories are
given to learn the GCN parameters. At test time, these filters are used to
predict the visual classifiers of unseen categories. We show that our approach
is robust to noise in the KG. More importantly, our approach provides
significant improvement in performance compared to the current state-of-the-art
results (from 2 ~ 3% on some metrics to whopping 20% on a few).Comment: CVPR 201
3D Shape Classification Using Collaborative Representation based Projections
A novel 3D shape classification scheme, based on collaborative representation
learning, is investigated in this work. A data-driven feature-extraction
procedure, taking the form of a simple projection operator, is in the core of
our methodology. Provided a shape database, a graph encapsulating the
structural relationships among all the available shapes, is first constructed
and then employed in defining low-dimensional sparse projections. The recently
introduced method of CRPs (collaborative representation based projections),
which is based on L2-Graph, is the first variant that is included towards this
end. A second algorithm, that particularizes the CRPs to shape descriptors that
are inherently nonnegative, is also introduced as potential alternative. In
both cases, the weights in the graph reflecting the database structure are
calculated so as to approximate each shape as a sparse linear combination of
the remaining dataset objects. By way of solving a generalized eigenanalysis
problem, a linear matrix operator is designed that will act as the feature
extractor. Two popular, inherently high dimensional descriptors, namely
ShapeDNA and Global Point Signature (GPS), are employed in our experimentations
with SHREC10, SHREC11 and SCHREC 15 datasets, where shape recognition is cast
as a multi-class classification problem that is tackled by means of an SVM
(support vector machine) acting within the reduced dimensional space of the
crafted projections. The results are very promising and outperform state of the
art methods, providing evidence about the highly discriminative nature of the
introduced 3D shape representations.Comment: 16 pages, 6 Figures, 3 Tables Statement including that an updated
version of this manuscript is under condiseration at Pattern Recognition
Letters, is adde
Regression-based Hypergraph Learning for Image Clustering and Classification
Inspired by the recently remarkable successes of Sparse Representation (SR),
Collaborative Representation (CR) and sparse graph, we present a novel
hypergraph model named Regression-based Hypergraph (RH) which utilizes the
regression models to construct the high quality hypergraphs. Moreover, we plug
RH into two conventional hypergraph learning frameworks, namely hypergraph
spectral clustering and hypergraph transduction, to present Regression-based
Hypergraph Spectral Clustering (RHSC) and Regression-based Hypergraph
Transduction (RHT) models for addressing the image clustering and
classification issues. Sparse Representation and Collaborative Representation
are employed to instantiate two RH instances and their RHSC and RHT algorithms.
The experimental results on six popular image databases demonstrate that the
proposed RH learning algorithms achieve promising image clustering and
classification performances, and also validate that RH can inherit the
desirable properties from both hypergraph models and regression models.Comment: 11page
PRISM: Person Re-Identification via Structured Matching
Person re-identification (re-id), an emerging problem in visual surveillance,
deals with maintaining entities of individuals whilst they traverse various
locations surveilled by a camera network. From a visual perspective re-id is
challenging due to significant changes in visual appearance of individuals in
cameras with different pose, illumination and calibration. Globally the
challenge arises from the need to maintain structurally consistent matches
among all the individual entities across different camera views. We propose
PRISM, a structured matching method to jointly account for these challenges. We
view the global problem as a weighted graph matching problem and estimate edge
weights by learning to predict them based on the co-occurrences of visual
patterns in the training examples. These co-occurrence based scores in turn
account for appearance changes by inferring likely and unlikely visual
co-occurrences appearing in training instances. We implement PRISM on single
shot and multi-shot scenarios. PRISM uniformly outperforms state-of-the-art in
terms of matching rate while being computationally efficient
Telugu OCR Framework using Deep Learning
In this paper, we address the task of Optical Character Recognition(OCR) for
the Telugu script. We present an end-to-end framework that segments the text
image, classifies the characters and extracts lines using a language model. The
segmentation is based on mathematical morphology. The classification module,
which is the most challenging task of the three, is a deep convolutional neural
network. The language is modelled as a third degree markov chain at the glyph
level. Telugu script is a complex alphasyllabary and the language is
agglutinative, making the problem hard. In this paper we apply the latest
advances in neural networks to achieve state-of-the-art error rates. We also
review convolutional neural networks in great detail and expound the
statistical justification behind the many tricks needed to make Deep Learning
work
Kernel-Induced Label Propagation by Mapping for Semi-Supervised Classification
Kernel methods have been successfully applied to the areas of pattern
recognition and data mining. In this paper, we mainly discuss the issue of
propagating labels in kernel space. A Kernel-Induced Label Propagation
(Kernel-LP) framework by mapping is proposed for high-dimensional data
classification using the most informative patterns of data in kernel space. The
essence of Kernel-LP is to perform joint label propagation and adaptive weight
learning in a transformed kernel space. That is, our Kernel-LP changes the task
of label propagation from the commonly-used Euclidean space in most existing
work to kernel space. The motivation of our Kernel-LP to propagate labels and
learn the adaptive weights jointly by the assumption of an inner product space
of inputs, i.e., the original linearly inseparable inputs may be mapped to be
separable in kernel space. Kernel-LP is based on existing positive and negative
LP model, i.e., the effects of negative label information are integrated to
improve the label prediction power. Also, Kernel-LP performs adaptive weight
construction over the same kernel space, so it can avoid the tricky process of
choosing the optimal neighborhood size suffered in traditional criteria. Two
novel and efficient out-of-sample approaches for our Kernel-LP to involve new
test data are also presented, i.e., (1) direct kernel mapping and (2) kernel
mapping-induced label reconstruction, both of which purely depend on the kernel
matrix between training set and testing set. Owing to the kernel trick, our
algorithms will be applicable to handle the high-dimensional real data.
Extensive results on real datasets demonstrate the effectiveness of our
approach.Comment: Accepted by IEEE TB
Autoencoder Based Sample Selection for Self-Taught Learning
Self-taught learning is a technique that uses a large number of unlabeled
data as source samples to improve the task performance on target samples.
Compared with other transfer learning techniques, self-taught learning can be
applied to a broader set of scenarios due to the loose restrictions on the
source data. However, knowledge transferred from source samples that are not
sufficiently related to the target domain may negatively influence the target
learner, which is referred to as negative transfer. In this paper, we propose a
metric for the relevance between a source sample and the target samples. To be
more specific, both source and target samples are reconstructed through a
single-layer autoencoder with a linear relationship between source samples and
reconstructed target samples being simultaneously enforced. An
-norm sparsity constraint is imposed on the transformation matrix
to identify source samples relevant to the target domain. Source domain samples
that are deemed relevant are assigned pseudo-labels reflecting their relevance
to target domain samples, and are combined with target samples in order to
provide an expanded training set for classifier training. Local data structures
are also preserved during source sample selection through spectral graph
analysis. Promising results in extensive experiments show the advantages of the
proposed approach.Comment: 38 pages, 4 figures, to appear in Elsevier Knowledge-Based System
Sparse Graph-based Transduction for Image Classification
Motivated by the remarkable successes of Graph-based Transduction (GT) and
Sparse Representation (SR), we present a novel Classifier named Sparse
Graph-based Classifier (SGC) for image classification. In SGC, SR is leveraged
to measure the correlation (similarity) of each two samples and a graph is
constructed for encoding these correlations. Then the Laplacian eigenmapping is
adopted for deriving the graph Laplacian of the graph. Finally, SGC can be
obtained by plugging the graph Laplacian into the conventional GT framework. In
the image classification procedure, SGC utilizes the correlations, which are
encoded in the learned graph Laplacian, to infer the labels of unlabeled
images. SGC inherits the merits of both GT and SR. Compared to SR, SGC improves
the robustness and the discriminating power of GT. Compared to GT, SGC
sufficiently exploits the whole data. Therefore it alleviates the undercomplete
dictionary issue suffered by SR. Four popular image databases are employed for
evaluation. The results demonstrate that SGC can achieve a promising
performance in comparison with the state-of-the-art classifiers, particularly
in the small training sample size case and the noisy sample case
- …