2,858 research outputs found
Biometric presentation attack detection: beyond the visible spectrum
The increased need for unattended authentication in
multiple scenarios has motivated a wide deployment of biometric
systems in the last few years. This has in turn led to the
disclosure of security concerns specifically related to biometric
systems. Among them, presentation attacks (PAs, i.e., attempts
to log into the system with a fake biometric characteristic or
presentation attack instrument) pose a severe threat to the
security of the system: any person could eventually fabricate
or order a gummy finger or face mask to impersonate someone
else. In this context, we present a novel fingerprint presentation
attack detection (PAD) scheme based on i) a new capture device
able to acquire images within the short wave infrared (SWIR)
spectrum, and i i) an in-depth analysis of several state-of-theart
techniques based on both handcrafted and deep learning
features. The approach is evaluated on a database comprising
over 4700 samples, stemming from 562 different subjects and
35 different presentation attack instrument (PAI) species. The
results show the soundness of the proposed approach with a
detection equal error rate (D-EER) as low as 1.35% even in a
realistic scenario where five different PAI species are considered
only for testing purposes (i.e., unknown attacks
Infrared face recognition: a comprehensive review of methodologies and databases
Automatic face recognition is an area with immense practical potential which
includes a wide range of commercial and law enforcement applications. Hence it
is unsurprising that it continues to be one of the most active research areas
of computer vision. Even after over three decades of intense research, the
state-of-the-art in face recognition continues to improve, benefitting from
advances in a range of different research fields such as image processing,
pattern recognition, computer graphics, and physiology. Systems based on
visible spectrum images, the most researched face recognition modality, have
reached a significant level of maturity with some practical success. However,
they continue to face challenges in the presence of illumination, pose and
expression changes, as well as facial disguises, all of which can significantly
decrease recognition accuracy. Amongst various approaches which have been
proposed in an attempt to overcome these limitations, the use of infrared (IR)
imaging has emerged as a particularly promising research direction. This paper
presents a comprehensive and timely review of the literature on this subject.
Our key contributions are: (i) a summary of the inherent properties of infrared
imaging which makes this modality promising in the context of face recognition,
(ii) a systematic review of the most influential approaches, with a focus on
emerging common trends as well as key differences between alternative
methodologies, (iii) a description of the main databases of infrared facial
images available to the researcher, and lastly (iv) a discussion of the most
promising avenues for future research.Comment: Pattern Recognition, 2014. arXiv admin note: substantial text overlap
with arXiv:1306.160
FE-Fusion-VPR: Attention-based Multi-Scale Network Architecture for Visual Place Recognition by Fusing Frames and Events
Traditional visual place recognition (VPR), usually using standard cameras,
is easy to fail due to glare or high-speed motion. By contrast, event cameras
have the advantages of low latency, high temporal resolution, and high dynamic
range, which can deal with the above issues. Nevertheless, event cameras are
prone to failure in weakly textured or motionless scenes, while standard
cameras can still provide appearance information in this case. Thus, exploiting
the complementarity of standard cameras and event cameras can effectively
improve the performance of VPR algorithms. In the paper, we propose
FE-Fusion-VPR, an attention-based multi-scale network architecture for VPR by
fusing frames and events. First, the intensity frame and event volume are fed
into the two-stream feature extraction network for shallow feature fusion.
Next, the three-scale features are obtained through the multi-scale fusion
network and aggregated into three sub-descriptors using the VLAD layer.
Finally, the weight of each sub-descriptor is learned through the descriptor
re-weighting network to obtain the final refined descriptor. Experimental
results show that on the Brisbane-Event-VPR and DDD20 datasets, the Recall@1 of
our FE-Fusion-VPR is 29.26% and 33.59% higher than Event-VPR and
Ensemble-EventVPR, and is 7.00% and 14.15% higher than MultiRes-NetVLAD and
NetVLAD. To our knowledge, this is the first end-to-end network that goes
beyond the existing event-based and frame-based SOTA methods to fuse frame and
events directly for VPR
On-line signature recognition through the combination of real dynamic data and synthetically generated static data
This is the author’s version of a work that was accepted for publication in Pattern Recognition . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition , 48, 9 (2005) DOI: 10.1016/j.patcog.2015.03.019On-line signature verification still remains a challenging task within biometrics. Due to their behavioral nature (opposed
to anatomic biometric traits), signatures present a notable variability even between successive realizations. This
leads to higher error rates than other largely used modalities such as iris or fingerprints and is one of the main reasons
for the relatively slow deployment of this technology. As a step towards the improvement of signature recognition
accuracy, the present paper explores and evaluates a novel approach that takes advantage of the performance boost
that can be reached through the fusion of on-line and off-line signatures. In order to exploit the complementarity of the
two modalities, we propose a method for the generation of enhanced synthetic static samples from on-line data. Such
synthetic off-line signatures are used on a new on-line signature recognition architecture based on the combination
of both types of data: real on-line samples and artificial off-line signatures synthesized from the real data. The new
on-line recognition approach is evaluated on a public benchmark containing both real versions (on-line and off-line) of
the exact same signatures. Different findings and conclusions are drawn regarding the discriminative power of on-line
and off-line signatures and of their potential combination both in the random and skilled impostors scenarios.M. D.-C. is supported by a PhD fellowship from the
ULPGC and M.G.-B. is supported by a FPU fellowship
from the Spanish MECD. This work has been partially
supported by projects: MCINN TEC2012-38630-
C04-02, Bio-Shield (TEC2012-34881) from Spanish
MINECO, BEAT (FP7-SEC-284989) from EU, CECABANK
and Cátedra UAM-Telefónic
Bio-Inspired Modality Fusion for Active Speaker Detection
Human beings have developed fantastic abilities to integrate information from
various sensory sources exploring their inherent complementarity. Perceptual
capabilities are therefore heightened enabling, for instance, the well known
"cocktail party" and McGurk effects, i.e. speech disambiguation from a panoply
of sound signals. This fusion ability is also key in refining the perception of
sound source location, as in distinguishing whose voice is being heard in a
group conversation. Furthermore, Neuroscience has successfully identified the
superior colliculus region in the brain as the one responsible for this
modality fusion, with a handful of biological models having been proposed to
approach its underlying neurophysiological process. Deriving inspiration from
one of these models, this paper presents a methodology for effectively fusing
correlated auditory and visual information for active speaker detection. Such
an ability can have a wide range of applications, from teleconferencing systems
to social robotics. The detection approach initially routes auditory and visual
information through two specialized neural network structures. The resulting
embeddings are fused via a novel layer based on the superior colliculus, whose
topological structure emulates spatial neuron cross-mapping of unimodal
perceptual fields. The validation process employed two publicly available
datasets, with achieved results confirming and greatly surpassing initial
expectations.Comment: Submitted to IEEE RA-L with IROS option, 202
RGB-D Salient Object Detection: A Survey
Salient object detection (SOD), which simulates the human visual perception
system to locate the most attractive object(s) in a scene, has been widely
applied to various computer vision tasks. Now, with the advent of depth
sensors, depth maps with affluent spatial information that can be beneficial in
boosting the performance of SOD, can easily be captured. Although various RGB-D
based SOD models with promising performance have been proposed over the past
several years, an in-depth understanding of these models and challenges in this
topic remains lacking. In this paper, we provide a comprehensive survey of
RGB-D based SOD models from various perspectives, and review related benchmark
datasets in detail. Further, considering that the light field can also provide
depth maps, we review SOD models and popular benchmark datasets from this
domain as well. Moreover, to investigate the SOD ability of existing models, we
carry out a comprehensive evaluation, as well as attribute-based evaluation of
several representative RGB-D based SOD models. Finally, we discuss several
challenges and open directions of RGB-D based SOD for future research. All
collected models, benchmark datasets, source code links, datasets constructed
for attribute-based evaluation, and codes for evaluation will be made publicly
available at https://github.com/taozh2017/RGBDSODsurveyComment: 24 pages, 12 figures. Has been accepted by Computational Visual Medi
- …