9,185 research outputs found
One-shot lip-based biometric authentication: extending behavioral features with authentication phrase information
Lip-based biometric authentication (LBBA) is an authentication method based
on a person's lip movements during speech in the form of video data captured by
a camera sensor. LBBA can utilize both physical and behavioral characteristics
of lip movements without requiring any additional sensory equipment apart from
an RGB camera. State-of-the-art (SOTA) approaches use one-shot learning to
train deep siamese neural networks which produce an embedding vector out of
these features. Embeddings are further used to compute the similarity between
an enrolled user and a user being authenticated. A flaw of these approaches is
that they model behavioral features as style-of-speech without relation to what
is being said. This makes the system vulnerable to video replay attacks of the
client speaking any phrase. To solve this problem we propose a one-shot
approach which models behavioral features to discriminate against what is being
said in addition to style-of-speech. We achieve this by customizing the GRID
dataset to obtain required triplets and training a siamese neural network based
on 3D convolutions and recurrent neural network layers. A custom triplet loss
for batch-wise hard-negative mining is proposed. Obtained results using an
open-set protocol are 3.2% FAR and 3.8% FRR on the test set of the customized
GRID dataset. Additional analysis of the results was done to quantify the
influence and discriminatory power of behavioral and physical features for
LBBA.Comment: 28 pages, 10 figures, 7 table
A Large-scale Distributed Video Parsing and Evaluation Platform
Visual surveillance systems have become one of the largest data sources of
Big Visual Data in real world. However, existing systems for video analysis
still lack the ability to handle the problems of scalability, expansibility and
error-prone, though great advances have been achieved in a number of visual
recognition tasks and surveillance applications, e.g., pedestrian/vehicle
detection, people/vehicle counting. Moreover, few algorithms explore the
specific values/characteristics in large-scale surveillance videos. To address
these problems in large-scale video analysis, we develop a scalable video
parsing and evaluation platform through combining some advanced techniques for
Big Data processing, including Spark Streaming, Kafka and Hadoop Distributed
Filesystem (HDFS). Also, a Web User Interface is designed in the system, to
collect users' degrees of satisfaction on the recognition tasks so as to
evaluate the performance of the whole system. Furthermore, the highly
extensible platform running on the long-term surveillance videos makes it
possible to develop more intelligent incremental algorithms to enhance the
performance of various visual recognition tasks.Comment: Accepted by Chinese Conference on Intelligent Visual Surveillance
201
Semantically selective augmentation for deep compact person re-identification
We present a deep person re-identification approach that combines semantically selective, deep data augmentation with clustering-based network compression to generate high performance, light and fast inference networks. In particular, we propose to augment limited training data via sampling from a deep convolutional generative adversarial network (DCGAN), whose discriminator is constrained by a semantic classifier to explicitly control the domain specificity of the generation process. Thereby, we encode information in the classifier network which can be utilized to steer adversarial synthesis, and which fuels our CondenseNet ID-network training. We provide a quantitative and qualitative analysis of the approach and its variants on a number of datasets, obtaining results that outperform the state-of-the-art on the LIMA dataset for long-term monitoring in indoor living spaces
- …