37 research outputs found
Unsupervised learning of clutter-resistant visual representations from natural videos
Populations of neurons in inferotemporal cortex (IT) maintain an explicit
code for object identity that also tolerates transformations of object
appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning
rules are not known, recent results [4, 5, 6] suggest the operation of an
unsupervised temporal-association-based method e.g., Foldiak's trace rule [7].
Such methods exploit the temporal continuity of the visual world by assuming
that visual experience over short timescales will tend to have invariant
identity content. Thus, by associating representations of frames from nearby
times, a representation that tolerates whatever transformations occurred in the
video may be achieved. Many previous studies verified that such rules can work
in simple situations without background clutter, but the presence of visual
clutter has remained problematic for this approach. Here we show that temporal
association based on large class-specific filters (templates) avoids the
problem of clutter. Our system learns in an unsupervised way from natural
videos gathered from the internet, and is able to perform a difficult
unconstrained face recognition task on natural images: Labeled Faces in the
Wild [8]
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table