4,945 research outputs found
Deep Human Parsing with Active Template Regression
In this work, the human parsing task, namely decomposing a human image into
semantic fashion/body regions, is formulated as an Active Template Regression
(ATR) problem, where the normalized mask of each fashion/body item is expressed
as the linear combination of the learned mask templates, and then morphed to a
more precise mask with the active shape parameters, including position, scale
and visibility of each semantic region. The mask template coefficients and the
active shape parameters together can generate the human parsing results, and
are thus called the structure outputs for human parsing. The deep Convolutional
Neural Network (CNN) is utilized to build the end-to-end relation between the
input human image and the structure outputs for human parsing. More
specifically, the structure outputs are predicted by two separate networks. The
first CNN network is with max-pooling, and designed to predict the template
coefficients for each label mask, while the second CNN network is without
max-pooling to preserve sensitivity to label mask position and accurately
predict the active shape parameters. For a new image, the structure outputs of
the two networks are fused to generate the probability of each label for each
pixel, and super-pixel smoothing is finally used to refine the human parsing
result. Comprehensive evaluations on a large dataset well demonstrate the
significant superiority of the ATR framework over other state-of-the-arts for
human parsing. In particular, the F1-score reaches by our ATR
framework, significantly higher than based on the state-of-the-art
algorithm.Comment: This manuscript is the accepted version for IEEE Transactions on
Pattern Analysis and Machine Intelligence (TPAMI) 201
Shape Recognition by Bag of Skeleton-associated Contour Parts
Contour and skeleton are two complementary representations for shape
recognition. However combining them in a principal way is nontrivial, as they
are generally abstracted by different structures (closed string vs graph),
respectively. This paper aims at addressing the shape recognition problem by
combining contour and skeleton according to the correspondence between them.
The correspondence provides a straightforward way to associate skeletal
information with a shape contour. More specifically, we propose a new shape
descriptor. named Skeleton-associated Shape Context (SSC), which captures the
features of a contour fragment associated with skeletal information. Benefited
from the association, the proposed shape descriptor provides the complementary
geometric information from both contour and skeleton parts, including the
spatial distribution and the thickness change along the shape part. To form a
meaningful shape feature vector for an overall shape, the Bag of Features
framework is applied to the SSC descriptors extracted from it. Finally, the
shape feature vector is fed into a linear SVM classifier to recognize the
shape. The encouraging experimental results demonstrate that the proposed way
to combine contour and skeleton is effective for shape recognition, which
achieves the state-of-the-art performances on several standard shape
benchmarks.Comment: 10 pages. Has been Accepted by Pattern Recognition Letters 201
Eigen Evolution Pooling for Human Action Recognition
We introduce Eigen Evolution Pooling, an efficient method to aggregate a
sequence of feature vectors. Eigen evolution pooling is designed to produce
compact feature representations for a sequence of feature vectors, while
maximally preserving as much information about the sequence as possible,
especially the temporal evolution of the features over time. Eigen evolution
pooling is a general pooling method that can be applied to any sequence of
feature vectors, from low-level RGB values to high-level Convolutional Neural
Network (CNN) feature vectors. We show that eigen evolution pooling is more
effective than average, max, and rank pooling for encoding the dynamics of
human actions in video. We demonstrate the power of eigen evolution pooling on
UCF101 and Hollywood2 datasets, two human action recognition benchmarks, and
achieve state-of-the-art performance
Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks
We present a new local descriptor for 3D shapes, directly applicable to a
wide range of shape analysis problems such as point correspondences, semantic
segmentation, affordance prediction, and shape-to-scan matching. The descriptor
is produced by a convolutional network that is trained to embed geometrically
and semantically similar points close to one another in descriptor space. The
network processes surface neighborhoods around points on a shape that are
captured at multiple scales by a succession of progressively zoomed out views,
taken from carefully selected camera positions. We leverage two extremely large
sources of data to train our network. First, since our network processes
rendered views in the form of 2D images, we repurpose architectures pre-trained
on massive image datasets. Second, we automatically generate a synthetic dense
point correspondence dataset by non-rigid alignment of corresponding shape
parts in a large collection of segmented 3D models. As a result of these design
choices, our network effectively encodes multi-scale local context and
fine-grained surface detail. Our network can be trained to produce either
category-specific descriptors or more generic descriptors by learning from
multiple shape categories. Once trained, at test time, the network extracts
local descriptors for shapes without requiring any part segmentation as input.
Our method can produce effective local descriptors even for shapes whose
category is unknown or different from the ones used while training. We
demonstrate through several experiments that our learned local descriptors are
more discriminative compared to state of the art alternatives, and are
effective in a variety of shape analysis applications
Shape Classification using Spectral Graph Wavelets
Spectral shape descriptors have been used extensively in a broad spectrum of
geometry processing applications ranging from shape retrieval and segmentation
to classification. In this pa- per, we propose a spectral graph wavelet
approach for 3D shape classification using the bag-of-features paradigm. In an
effort to capture both the local and global geometry of a 3D shape, we present
a three-step feature description framework. First, local descriptors are
extracted via the spectral graph wavelet transform having the Mexican hat
wavelet as a generating ker- nel. Second, mid-level features are obtained by
embedding lo- cal descriptors into the visual vocabulary space using the soft-
assignment coding step of the bag-of-features model. Third, a global descriptor
is constructed by aggregating mid-level fea- tures weighted by a geodesic
exponential kernel, resulting in a matrix representation that describes the
frequency of appearance of nearby codewords in the vocabulary. Experimental
results on two standard 3D shape benchmarks demonstrate the effective- ness of
the proposed classification approach in comparison with state-of-the-art
methods
Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation
Remote sensing (RS) image retrieval is of great significant for geological
information mining. Over the past two decades, a large amount of research on
this task has been carried out, which mainly focuses on the following three
core issues: feature extraction, similarity metric and relevance feedback. Due
to the complexity and multiformity of ground objects in high-resolution remote
sensing (HRRS) images, there is still room for improvement in the current
retrieval approaches. In this paper, we analyze the three core issues of RS
image retrieval and provide a comprehensive review on existing methods.
Furthermore, for the goal to advance the state-of-the-art in HRRS image
retrieval, we focus on the feature extraction issue and delve how to use
powerful deep representations to address this task. We conduct systematic
investigation on evaluating correlative factors that may affect the performance
of deep features. By optimizing each factor, we acquire remarkable retrieval
results on publicly available HRRS datasets. Finally, we explain the
experimental phenomenon in detail and draw conclusions according to our
analysis. Our work can serve as a guiding role for the research of
content-based RS image retrieval
A Classifier-guided Approach for Top-down Salient Object Detection
We propose a framework for top-down salient object detection that
incorporates a tightly coupled image classification module. The classifier is
trained on novel category-aware sparse codes computed on object dictionaries
used for saliency modeling. A misclassification indicates that the
corresponding saliency model is inaccurate. Hence, the classifier selects
images for which the saliency models need to be updated. The category-aware
sparse coding produces better image classification accuracy as compared to
conventional sparse coding with a reduced computational complexity. A
saliency-weighted max-pooling is proposed to improve image classification,
which is further used to refine the saliency maps. Experimental results on
Graz-02 and PASCAL VOC-07 datasets demonstrate the effectiveness of salient
object detection. Although the role of the classifier is to support salient
object detection, we evaluate its performance in image classification and also
illustrate the utility of thresholded saliency maps for image segmentation.Comment: To appear in Signal Processing: Image Communication, Elsevier.
Available online from April 201
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
Deep Boosting: Joint Feature Selection and Analysis Dictionary Learning in Hierarchy
This work investigates how the traditional image classification pipelines can
be extended into a deep architecture, inspired by recent successes of deep
neural networks. We propose a deep boosting framework based on layer-by-layer
joint feature boosting and dictionary learning. In each layer, we construct a
dictionary of filters by combining the filters from the lower layer, and
iteratively optimize the image representation with a joint
discriminative-generative formulation, i.e. minimization of empirical
classification error plus regularization of analysis image generation over
training images. For optimization, we perform two iterating steps: i) to
minimize the classification error, select the most discriminative features
using the gentle adaboost algorithm; ii) according to the feature selection,
update the filters to minimize the regularization on analysis image
representation using the gradient descent method. Once the optimization is
converged, we learn the higher layer representation in the same way. Our model
delivers several distinct advantages. First, our layer-wise optimization
provides the potential to build very deep architectures. Second, the generated
image representation is compact and meaningful. In several visual recognition
tasks, our framework outperforms existing state-of-the-art approaches
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
- …