1,696 research outputs found
Multi-scale keypoints in V1 and face detection
End-stopped cells in cortical area V1, which combine out-
puts of complex cells tuned to different orientations, serve to detect line
and edge crossings (junctions) and points with a large curvature. In this
paper we study the importance of the multi-scale keypoint representa-
tion, i.e. retinotopic keypoint maps which are tuned to different spatial
frequencies (scale or Level-of-Detail). We show that this representation
provides important information for Focus-of-Attention (FoA) and object
detection. In particular, we show that hierarchically-structured saliency
maps for FoA can be obtained, and that combinations over scales in
conjunction with spatial symmetries can lead to face detection through
grouping operators that deal with keypoints at the eyes, nose and mouth,
especially when non-classical receptive field inhibition is employed. Al-
though a face detector can be based on feedforward and feedback loops
within area V1, such an operator must be embedded into dorsal and
ventral data streams to and from higher areas for obtaining translation-,
rotation- and scale-invariant face (object) detection
Correlates of facial expressions in the primary visual cortex
Face detection and recognition should be complemented by recognition of facial expression, for example for social robots which must react to human emotions. Our framework is based on two multi-scale representations in cortical area V1: keypoints at eyes, nose and mouth are grouped for face detection [1]; lines and edges provide information for face recognition [2]
Face normalization using multi-scale cortical keypoints
Empirical studies concerning face recognition suggest
that faces may be stored in memory by a few canonical representations.
Models of visual perception are based on image
representations in cortical area V1 and beyond, which
contain many cell layers for feature extractions. Simple,
complex and end-stopped cells tuned to different spatial frequencies
(scales) and/or orientations provide input for line,
edge and keypoint detection. This yields a rich, multi-scale
object representation that can be stored in memory in order
to identify objects. The multi-scale, keypoint-based saliency
maps for Focus-of-Attention can be explored to obtain face
detection and normalization, after which face recognition
can be achieved using the line/edge representation. In this
paper, we focus only on face normalization, showing that
multi-scale keypoints can be used to construct canonical
representations of faces in memory
Multi-scale keypoints in V1 and beyond: object segregation, scale selection, saliency maps and face detection
End-stopped cells in cortical area V1, which combine outputs of complex cells tuned to different orientations, serve to detect line and edge crossings, singularities and points with large curvature. These cells can be used to construct retinotopic keypoint maps at different spatial scales (level-of-detail). The importance of the multi-scale keypoint representation is studied in this paper. It is shown that this representation provides very important information for object recognition and face detection. Different grouping operators can be used for object segregation and automatic scale selection. Saliency maps for focus-of-attention can be constructed. Such maps can be employed for face detection by grouping facial landmarks at eyes, nose and mouth. Although a face detector can be based on processing within area V1, it is argued that such an operator must be embedded into dorsal and ventral data streams, to and from higher cortical areas, for obtaining translation-, rotation- and scale-invariant detection
Persistent Evidence of Local Image Properties in Generic ConvNets
Supervised training of a convolutional network for object classification
should make explicit any information related to the class of objects and
disregard any auxiliary information associated with the capture of the image or
the variation within the object class. Does this happen in practice? Although
this seems to pertain to the very final layers in the network, if we look at
earlier layers we find that this is not the case. Surprisingly, strong spatial
information is implicit. This paper addresses this, in particular, exploiting
the image representation at the first fully connected layer, i.e. the global
image descriptor which has been recently shown to be most effective in a range
of visual recognition tasks. We empirically demonstrate evidences for the
finding in the contexts of four different tasks: 2d landmark detection, 2d
object keypoints prediction, estimation of the RGB values of input image, and
recovery of semantic label of each pixel. We base our investigation on a simple
framework with ridge rigression commonly across these tasks, and show results
which all support our insight. Such spatial information can be used for
computing correspondence of landmarks to a good accuracy, but should
potentially be useful for improving the training of the convolutional nets for
classification purposes
Face recognition by cortical multi-scale line and edge representations
Empirical studies concerning face recognition suggest that
faces may be stored in memory by a few canonical representations. Models
of visual perception are based on image representations in cortical
area V1 and beyond, which contain many cell layers for feature extraction.
Simple, complex and end-stopped cells provide input for line, edge
and keypoint detection. Detected events provide a rich, multi-scale object
representation, and this representation can be stored in memory in
order to identify objects. In this paper, the above context is applied to
face recognition. The multi-scale line/edge representation is explored in
conjunction with keypoint-based saliency maps for Focus-of-Attention.
Recognition rates of up to 96% were achieved by combining frontal and
3/4 views, and recognition was quite robust against partial occlusions
UcoSLAM: Simultaneous Localization and Mapping by Fusion of KeyPoints and Squared Planar Markers
This paper proposes a novel approach for Simultaneous Localization and
Mapping by fusing natural and artificial landmarks. Most of the SLAM approaches
use natural landmarks (such as keypoints). However, they are unstable over
time, repetitive in many cases or insufficient for a robust tracking (e.g. in
indoor buildings). On the other hand, other approaches have employed artificial
landmarks (such as squared fiducial markers) placed in the environment to help
tracking and relocalization. We propose a method that integrates both
approaches in order to achieve long-term robust tracking in many scenarios.
Our method has been compared to the start-of-the-art methods ORB-SLAM2 and
LDSO in the public dataset Kitti, Euroc-MAV, TUM and SPM, obtaining better
precision, robustness and speed. Our tests also show that the combination of
markers and keypoints achieves better accuracy than each one of them
independently.Comment: Paper submitted to Pattern Recognitio
- …