19,074 research outputs found
Collaborative Representation based Classification for Face Recognition
By coding a query sample as a sparse linear combination of all training
samples and then classifying it by evaluating which class leads to the minimal
coding residual, sparse representation based classification (SRC) leads to
interesting results for robust face recognition. It is widely believed that the
l1- norm sparsity constraint on coding coefficients plays a key role in the
success of SRC, while its use of all training samples to collaboratively
represent the query sample is rather ignored. In this paper we discuss how SRC
works, and show that the collaborative representation mechanism used in SRC is
much more crucial to its success of face classification. The SRC is a special
case of collaborative representation based classification (CRC), which has
various instantiations by applying different norms to the coding residual and
coding coefficient. More specifically, the l1 or l2 norm characterization of
coding residual is related to the robustness of CRC to outlier facial pixels,
while the l1 or l2 norm characterization of coding coefficient is related to
the degree of discrimination of facial features. Extensive experiments were
conducted to verify the face recognition accuracy and efficiency of CRC with
different instantiations.Comment: It is a substantial revision of a previous conference paper (L.
Zhang, M. Yang, et al. "Sparse Representation or Collaborative
Representation: Which Helps Face Recognition?" in ICCV 2011
Objects2action: Classifying and localizing actions without any video example
The goal of this paper is to recognize actions in video without the need for
examples. Different from traditional zero-shot approaches we do not demand the
design and specification of attribute classifiers and class-to-attribute
mappings to allow for transfer from seen classes to unseen classes. Our key
contribution is objects2action, a semantic word embedding that is spanned by a
skip-gram model of thousands of object categories. Action labels are assigned
to an object encoding of unseen video based on a convex combination of action
and object affinities. Our semantic embedding has three main characteristics to
accommodate for the specifics of actions. First, we propose a mechanism to
exploit multiple-word descriptions of actions and objects. Second, we
incorporate the automated selection of the most responsive objects per action.
And finally, we demonstrate how to extend our zero-shot approach to the
spatio-temporal localization of actions in video. Experiments on four action
datasets demonstrate the potential of our approach
- …