131 research outputs found
On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly
In the field of face recognition, Sparse Representation (SR) has received
considerable attention during the past few years. Most of the relevant
literature focuses on holistic descriptors in closed-set identification
applications. The underlying assumption in SR-based methods is that each class
in the gallery has sufficient samples and the query lies on the subspace
spanned by the gallery of the same class. Unfortunately, such assumption is
easily violated in the more challenging face verification scenario, where an
algorithm is required to determine if two faces (where one or both have not
been seen before) belong to the same person. In this paper, we first discuss
why previous attempts with SR might not be applicable to verification problems.
We then propose an alternative approach to face verification via SR.
Specifically, we propose to use explicit SR encoding on local image patches
rather than the entire face. The obtained sparse signals are pooled via
averaging to form multiple region descriptors, which are then concatenated to
form an overall face descriptor. Due to the deliberate loss spatial relations
within each region (caused by averaging), the resulting descriptor is robust to
misalignment & various image deformations. Within the proposed framework, we
evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder
Neural Network (SANN), and an implicit probabilistic technique based on
Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and
ChokePoint datasets show that the proposed local SR approach obtains
considerably better and more robust performance than several previous
state-of-the-art holistic SR methods, in both verification and closed-set
identification problems. The experiments also show that l1-minimisation based
encoding has a considerably higher computational than the other techniques, but
leads to higher recognition rates
Dual-Glance Model for Deciphering Social Relationships
Since the beginning of early civilizations, social relationships derived from
each individual fundamentally form the basis of social structure in our daily
life. In the computer vision literature, much progress has been made in scene
understanding, such as object detection and scene parsing. Recent research
focuses on the relationship between objects based on its functionality and
geometrical relations. In this work, we aim to study the problem of social
relationship recognition, in still images. We have proposed a dual-glance model
for social relationship recognition, where the first glance fixates at the
individual pair of interest and the second glance deploys attention mechanism
to explore contextual cues. We have also collected a new large scale People in
Social Context (PISC) dataset, which comprises of 22,670 images and 76,568
annotated samples from 9 types of social relationship. We provide benchmark
results on the PISC dataset, and qualitatively demonstrate the efficacy of the
proposed model.Comment: IEEE International Conference on Computer Vision (ICCV), 201
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking
Action recognition has received increasing attention from the computer vision
and machine learning communities in the last decade. To enable the study of
this problem, there exist a vast number of action datasets, which are recorded
under controlled laboratory settings, real-world surveillance environments, or
crawled from the Internet. Apart from the "in-the-wild" datasets, the training
and test split of conventional datasets often possess similar environments
conditions, which leads to close to perfect performance on constrained
datasets. In this paper, we introduce a new dataset, namely Multi-Camera Action
Dataset (MCAD), which is designed to evaluate the open view classification
problem under the surveillance environment. In total, MCAD contains 14,298
action samples from 18 action categories, which are performed by 20 subjects
and independently recorded with 5 cameras. Inspired by the well received
evaluation approach on the LFW dataset, we designed a standard evaluation
protocol and benchmarked MCAD under several scenarios. The benchmark shows that
while an average of 85% accuracy is achieved under the closed-view scenario,
the performance suffers from a significant drop under the cross-view scenario.
In the worst case scenario, the performance of 10-fold cross validation drops
from 87.0% to 47.4%
Automatic Classification of Human Epithelial Type 2 Cell Indirect Immunofluorescence Images using Cell Pyramid Matching
This paper describes a novel system for automatic classification of images
obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial
type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The
IIF protocol on HEp-2 cells has been the hallmark method to identify the
presence of ANAs, due to its high sensitivity and the large range of antigens
that can be detected. However, it suffers from numerous shortcomings, such as
being subjective as well as time and labour intensive. Computer Aided
Diagnostic (CAD) systems have been developed to address these problems, which
automatically classify a HEp-2 cell image into one of its known patterns (eg.
speckled, homogeneous). Most of the existing CAD systems use handpicked
features to represent a HEp-2 cell image, which may only work in limited
scenarios. We propose a novel automatic cell image classification method termed
Cell Pyramid Matching (CPM), which is comprised of regional histograms of
visual words coupled with the Multiple Kernel Learning framework. We present a
study of several variations of generating histograms and show the efficacy of
the system on two publicly available datasets: the ICPR HEp-2 cell
classification contest dataset and the SNPHEp-2 dataset.Comment: arXiv admin note: substantial text overlap with arXiv:1304.126
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
The creation of 4D avatars (i.e., animated 3D avatars) from text description
typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in
the canonical space and subsequently applies animation with target motions.
However, such an optimization-by-animation paradigm has several drawbacks. (1)
For pose-agnostic optimization, the rendered images in canonical pose for naive
Score Distillation Sampling (SDS) exhibit domain gap and cannot preserve
view-consistency using only T2I priors, and (2) For post hoc animation, simply
applying the source motions to target 3D avatars yields translation artifacts
and misalignment. To address these issues, we propose Skeleton-aware Text-based
4D Avatar generation with in-network motion Retargeting (STAR). STAR considers
the geometry and skeleton differences between the template mesh and target
avatar, and corrects the mismatched source motion by resorting to the
pretrained motion retargeting techniques. With the informatively retargeted and
occlusion-aware skeleton, we embrace the skeleton-conditioned T2I and
text-to-video (T2V) priors, and propose a hybrid SDS module to coherently
provide multi-view and frame-consistent supervision signals. Hence, STAR can
progressively optimize the geometry, texture, and motion in an end-to-end
manner. The quantitative and qualitative experiments demonstrate our proposed
STAR can synthesize high-quality 4D avatars with vivid animations that align
well with the text description. Additional ablation studies shows the
contributions of each component in STAR. The source code and demos are
available at:
\href{https://star-avatar.github.io}{https://star-avatar.github.io}.Comment: Tech repor
Learning to Predict Gradients for Semi-Supervised Continual Learning
A key challenge for machine intelligence is to learn new visual concepts
without forgetting the previously acquired knowledge. Continual learning is
aimed towards addressing this challenge. However, there is a gap between
existing supervised continual learning and human-like intelligence, where human
is able to learn from both labeled and unlabeled data. How unlabeled data
affects learning and catastrophic forgetting in the continual learning process
remains unknown. To explore these issues, we formulate a new semi-supervised
continual learning method, which can be generically applied to existing
continual learning models. Specifically, a novel gradient learner learns from
labeled data to predict gradients on unlabeled data. Hence, the unlabeled data
could fit into the supervised continual learning method. Different from
conventional semi-supervised settings, we do not hypothesize that the
underlying classes, which are associated to the unlabeled data, are known to
the learning process. In other words, the unlabeled data could be very distinct
from the labeled data. We evaluate the proposed method on mainstream continual
learning, adversarial continual learning, and semi-supervised learning tasks.
The proposed method achieves state-of-the-art performance on classification
accuracy and backward transfer in the continual learning setting while
achieving desired performance on classification accuracy in the semi-supervised
learning setting. This implies that the unlabeled images can enhance the
generalizability of continual learning models on the predictive ability on
unseen data and significantly alleviate catastrophic forgetting. The code is
available at \url{https://github.com/luoyan407/grad_prediction.git}.Comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems
(TNNLS
- …
