21 research outputs found
Fast, collaborative acquisition of multi-view face images using a camera network and its impact on real-time human identification
Biometric systems have been typically designed to operate under controlled environments based on previously acquired photographs and videos. But recent terror attacks, security threats and intrusion attempts have necessitated a transition to modern biometric systems that can identify humans in real-time under unconstrained environments. Distributed camera networks are appropriate for unconstrained scenarios because they can provide multiple views of a scene, thus offering tolerance against variable pose of a human subject and possible occlusions. In dynamic environments, the face images are continually arriving at the base station with different quality, pose and resolution. Designing a fusion strategy poses significant challenges. Such a scenario demands that only the relevant information is processed and the verdict (match / no match) regarding a particular subject is quickly (yet accurately) released so that more number of subjects in the scene can be evaluated.;To address these, we designed a wireless data acquisition system that is capable of acquiring multi-view faces accurately and at a rapid rate. The idea of epipolar geometry is exploited to get high multi-view face detection rates. Face images are labeled to their corresponding poses and are transmitted to the base station. To evaluate the impact of face images acquired using our real-time face image acquisition system on the overall recognition accuracy, we interface it with a face matching subsystem and thus create a prototype real-time multi-view face recognition system. For front face matching, we use the commercial PittPatt software. For non-frontal matching, we use a Local binary Pattern based classifier. Matching scores obtained from both frontal and non-frontal face images are fused for final classification. Our results show significant improvement in recognition accuracy, especially when the front face images are of low resolution
Twin identification over viewpoint change: A deep convolutional neural network surpasses humans
Deep convolutional neural networks (DCNNs) have achieved human-level accuracy
in face identification (Phillips et al., 2018), though it is unclear how
accurately they discriminate highly-similar faces. Here, humans and a DCNN
performed a challenging face-identity matching task that included identical
twins. Participants (N=87) viewed pairs of face images of three types:
same-identity, general imposter pairs (different identities from similar
demographic groups), and twin imposter pairs (identical twin siblings). The
task was to determine whether the pairs showed the same person or different
people. Identity comparisons were tested in three viewpoint-disparity
conditions: frontal to frontal, frontal to 45-degree profile, and frontal to
90-degree profile. Accuracy for discriminating matched-identity pairs from
twin-imposters and general imposters was assessed in each viewpoint-disparity
condition. Humans were more accurate for general-imposter pairs than
twin-imposter pairs, and accuracy declined with increased viewpoint disparity
between the images in a pair. A DCNN trained for face identification (Ranjan et
al., 2018) was tested on the same image pairs presented to humans. Machine
performance mirrored the pattern of human accuracy, but with performance at or
above all humans in all but one condition. Human and machine similarity scores
were compared across all image-pair types. This item-level analysis showed that
human and machine similarity ratings correlated significantly in six of nine
image-pair types [range r=0.38 to r=0.63], suggesting general accord between
the perception of face similarity by humans and the DCNN. These findings also
contribute to our understanding of DCNN performance for discriminating
high-resemblance faces, demonstrate that the DCNN performs at a level at or
above humans, and suggest a degree of parity between the features used by
humans and the DCNN
The CLEAR 2007 Evaluation
Abstract. This paper is a summary of the 2007 CLEAR Evaluation on the Classification of Events, Activities, and Relationships which took place in early 2007 and culminated with a two-day workshop held in May 2007. CLEAR is an international effort to evaluate systems for the perception of people, their activities, and interactions. In its second year, CLEAR has developed a following from the computer vision and speech communities, spawning a more multimodal perspective of research eval-uation. This paper describes the evaluation tasks, including metrics and databases used, and discusses the results achieved. The CLEAR 2007 tasks comprise person, face, and vehicle tracking, head pose estimation, as well as acoustic scene analysis. These include subtasks performed in the visual, acoustic and audio-visual domains for meeting room and surveillance data.
Suivi de visages par regroupement de détections : traitement séquentiel par blocs
Session "Posters"National audienceCet article décrit une méthode de partitionnement des visages d'une séquence vidéo; elle se base sur une méthode de type tracking-by-detections et utilise une modélisation probabiliste de type Maximum A Posteriori, résolu par un algorithme s'appuyant sur une recherche de flot de coût minimal sur un graphe. Face aux contraintes de densité, mouvement et taille des détections de visage issues de la vidéosurveillance, les travaux présentés apportent deux contributions : (1) la définition de différentes dissimilarités (spatiale, temporelle, apparence et mouvement) combinées de façon simple et (2) la mise en œuvre d'une version séquentielle par blocs d'images qui permet de traiter des flux vidéos. La méthode proposée est évaluée sur plusieurs séquences réelles annotées
Simultaneously Tracking and Recognition of Facial Features and Facial Expressions
The tracking and recognition of facial activities from images have attracted a great attention in computer vision field. Face activities are divided in the three different levels. First, the bottom level, facial feature points around every facial sections, i.e., eyebrow, mouth, etc., capture the face information point by point. Second, in the middle level, facial activity units, defined in the facial activity coding framework, illustrate the construction of a specific set of facial muscles, i.e., lid tightener, eyebrow raiser, and so forth. At last, in the top level, six types of facial expressions are illustrate the facial muscle movement and are usually used to display the human feeling states. Rather than the standard approaches, which in general just concentrate on a two levels of face activities, and tracking them independently, this paper presents a Advanced machine learning techniques are used to learn the model in view of both training information and subjective prior knowledge. Given the model and the estimates the facial movements, and each of the three individual levels of face activities are recognized at the same time through an expression engine. Experiments are performed to illustrate effectiveness of the above proposed model