Search CORE

182 research outputs found

The Role of Riemannian Manifolds in Computer Vision: From Coding to Deep Metric Learning

Author: Faraki Masoud
Publication venue
Publication date: 01/01/2018
Field of study

A diverse number of tasks in computer vision and machine learning enjoy from representations of data that are compact yet discriminative, informative and robust to critical measurements. Two notable representations are offered by Region Covariance Descriptors (RCovD) and linear subspaces which are naturally analyzed through the manifold of Symmetric Positive Definite (SPD) matrices and the Grassmann manifold, respectively, two widely used types of Riemannian manifolds in computer vision. As our first objective, we examine image and video-based recognition applications where the local descriptors have the aforementioned Riemannian structures, namely the SPD or linear subspace structure. Initially, we provide a solution to compute Riemannian version of the conventional Vector of Locally aggregated Descriptors (VLAD), using geodesic distance of the underlying manifold as the nearness measure. Next, by having a closer look at the resulting codes, we formulate a new concept which we name Local Difference Vectors (LDV). LDVs enable us to elegantly expand our Riemannian coding techniques to any arbitrary metric as well as provide intrinsic solutions to Riemannian sparse coding and its variants when local structured descriptors are considered. We then turn our attention to two special types of covariance descriptors namely infinite-dimensional RCovDs and rank-deficient covariance matrices for which the underlying Riemannian structure, i.e. the manifold of SPD matrices is out of reach to great extent. %Generally speaking, infinite-dimensional RCovDs offer better discriminatory power over their low-dimensional counterparts. To overcome this difficulty, we propose to approximate the infinite-dimensional RCovDs by making use of two feature mappings, namely random Fourier features and the Nystrom method. As for the rank-deficient covariance matrices, unlike most existing approaches that employ inference tools by predefined regularizers, we derive positive definite kernels that can be decomposed into the kernels on the cone of SPD matrices and kernels on the Grassmann manifolds and show their effectiveness for image set classification task. Furthermore, inspired by attractive properties of Riemannian optimization techniques, we extend the recently introduced Keep It Simple and Straightforward MEtric learning (KISSME) method to the scenarios where input data is non-linearly distributed. To this end, we make use of the infinite dimensional covariance matrices and propose techniques towards projecting on the positive cone in a Reproducing Kernel Hilbert Space (RKHS). We also address the sensitivity issue of the KISSME to the input dimensionality. The KISSME algorithm is greatly dependent on Principal Component Analysis (PCA) as a preprocessing step which can lead to difficulties, especially when the dimensionality is not meticulously set. To address this issue, based on the KISSME algorithm, we develop a Riemannian framework to jointly learn a mapping performing dimensionality reduction and a metric in the induced space. Lastly, in line with the recent trend in metric learning, we devise end-to-end learning of a generic deep network for metric learning using our derivation

The Australian National University

A survey of face recognition techniques under occlusion

Author: Spreeuwers Luuk
Veldhuis Raymond
Zeng Dan
Publication venue
Publication date: 19/06/2020
Field of study

The limited capacity to recognize faces under occlusions is a long-standing problem that presents a unique challenge for face recognition systems and even for humans. The problem regarding occlusion is less covered by research when compared to other challenges such as pose variation, different expressions, etc. Nevertheless, occluded face recognition is imperative to exploit the full potential of face recognition for real-world applications. In this paper, we restrict the scope to occluded face recognition. First, we explore what the occlusion problem is and what inherent difficulties can arise. As a part of this review, we introduce face detection under occlusion, a preliminary step in face recognition. Second, we present how existing face recognition methods cope with the occlusion problem and classify them into three categories, which are 1) occlusion robust feature extraction approaches, 2) occlusion aware face recognition approaches, and 3) occlusion recovery based face recognition approaches. Furthermore, we analyze the motivations, innovations, pros and cons, and the performance of representative approaches for comparison. Finally, future challenges and method trends of occluded face recognition are thoroughly discussed

arXiv.org e-Print Archive

Directory of Open Access Journals

University of Twente Research Information

Dictionary Representation of Deep Features for Occlusion-Robust Face Recognition

Author: Cen Feng
Wang Guanghui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/01/2021
Field of study

Deep learning has achieved exciting results in face recognition; however, the accuracy is still unsatisfying for occluded faces. To improve the robustness for occluded faces, this paper proposes a novel deep dictionary representation-based classification scheme, where a convolutional neural network is employed as the feature extractor and followed by a dictionary to linearly code the extracted deep features. The dictionary is composed by a gallery part consisting of the deep features of the training samples and an auxiliary part consisting of the mapping vectors acquired from the subjects either inside or outside the training set and associated with the occlusion patterns of the testing face samples. A squared Euclidean norm is used to regularize the coding coefficients. The proposed scheme is computationally efficient and is robust to large contiguous occlusion. In addition, the proposed scheme is generic for both the occluded and non-occluded face images and works with a single training sample per subject. The extensive experimental evaluations demonstrate the superior performance of the proposed approach over other state-of-the-art algorithms

KU ScholarWorks

Advanced Biometrics with Deep Learning

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

Directory of Open Access Books (DOAB)

A Survey of Face Recognition

Author: Chen Bihui
Guo Yandong
Peng Jianteng
Wang Xinyi
Wang Yi
Zhang Sufang
Publication venue
Publication date: 26/12/2022
Field of study

Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023

arXiv.org e-Print Archive

Recommended from our members

Masked face recognition using deep learning: a review

Author: Al-Hadhrami T
Albalas F
Alzu’bi A
Bani Younis LB
Bashayreh A
Publication venue: 'MDPI AG'
Publication date: 01/10/2021
Field of study

A large number of intelligent models for masked face recognition (MFR) has been recently presented and applied in various fields, such as masked face tracking for people safety or secure authentication. Exceptional hazards such as pandemics and frauds have noticeably accelerated the abundance of relevant algorithm creation and sharing, which has introduced new challenges. Therefore, recognizing and authenticating people wearing masks will be a long-established research area, and more efficient methods are needed for real-time MFR. Machine learning has made progress in MFR and has significantly facilitated the intelligent process of detecting and authenticating persons with occluded faces. This survey organizes and reviews the recent works developed for MFR based on deep learning techniques, providing insights and thorough discussion on the development pipeline of MFR systems. State-of-the-art techniques are introduced according to the characteristics of deep network architectures and deep feature extraction strategies. The common benchmarking datasets and evaluation metrics used in the field of MFR are also discussed. Many challenges and promising research directions are highlighted. This comprehensive study considers a wide variety of recent approaches and achievements, aiming to shape a global view of the field of MFR

Nottingham Trent Institutional Repository (IRep)

Directory of Open Access Journals

Sparse Representations and Feature Learning for Image Set Classification and Correspondence Estimation

Author: Fathy Mohammed E.
Publication venue
Publication date: 01/01/2018
Field of study

The use of effective features is a key component in solving many computer vision tasks including, but not limited to, image (set) classification and correspondence estimation. Many research directions have focused on finding good features for the task under consideration, traditionally by hand crafting and recently by machine learning. In our work, we present algorithms for feature extraction and sparse representation for the classification of image sets. In addition, we present an approach for deep metric learning for correspondence estimation. We start by benchmarking various image set classification methods on a mobile video dataset that we have collected and made public. The videos were acquired under three different ambient conditions to capture the type of variations caused by the 'mobility' of the devices. An inspection of these videos reveals a combination of favorable and challenging properties unique to smartphone face videos. Besides mobility, the dataset has other challenges including partial faces, occasional pose changes, blur and fiducial point localization errors. Based on the evaluation, the recognition rates drop dramatically when enrollment and test videos come from different sessions. We then present Bayesian Representation-based Classification (BRC), an approach based on sparse Bayesian regression and subspace clustering for image set classification. A Bayesian statistical framework is used to compare BRC with similar existing approaches such as Collaborative Representation-based Classification (CRC) and Sparse Representation-based Classification (SRC), where it is shown that BRC employs precision hyperpriors that are more non-informative than those of CRC/SRC. Furthermore, we present a robust probe image set handling strategy that balances the trade-off between efficiency and accuracy. Experiments on three datasets illustrate the effectiveness of our algorithm compared to state-of-the-art set-based methods. We then propose to represent image sets as a dictionaries of hand-crafted descriptors based on Symmetric Positive Definite (SPD) matrices that are more robust to local deformations and fiducial point location errors. We then learn a tangent map for transforming the SPD matrix logarithms into a lower-dimensional Log-Euclidean space such that the transformed gallery atoms adhere to a more discriminative subspace structure. A query image set is then classified by first mapping its SPD descriptors into the computed Log-Euclidean tangent space and then using the sparse representation over the tangent space to decide a label for the image set. Experiments on four public datasets show that representation-based classification based on the proposed features outperforms many state-of-the-art methods. We then present Nonlinear Subspace Feature Enhancement (NSFE), an approach for nonlinearly embedding image sets into a space where they adhere to a more discriminative subspace structure. We describe how the structured loss function of NSFE can be optimized in a batch-by-batch fashion by a two-step alternating algorithm. The algorithm makes very few assumptions about the form of the embedding to be learned and is compatible with stochastic gradient descent and back-propagation. We evaluate NSFE with different types of input features and nonlinear embeddings and show that NSFE compares favorably to state-of-the-art image set classification methods. Finally, we propose a hierarchical approach for deep metric learning and descriptor matching for the task of point correspondence estimation. Our idea is motivated by the observation that existing metric learning approaches based on supervising and matching with only the deepest layer result in features that are suboptimal in some aspects to shallower features. Instead, the best matching performance, as we empirically show, is obtained by combining the high invariance of deeper features with the geometric sensitivity and higher precision of shallower features. We compare our method to state-of-the-art networks as well as fusion baselines inspired from existing semantic segmentation networks and empirically show that our method is more accurate and better suited to correspondence estimation

Digital Repository at the University of Maryland

Face Recognition from Weakly Labeled Data

Author: Chen Ching-Hui
Publication venue
Publication date: 01/01/2016
Field of study

Recognizing the identity of a face or a person in the media usually requires lots of training data to design robust classifiers, which demands a great amount of human effort for annotation. Alternatively, the weakly labeled data is publicly available, but the labels can be ambiguous or noisy. For instance, names in the caption of a news photo provide possible candidates for faces appearing in the image. Names in the screenplays are only weakly associated with faces in the videos. Since weakly labeled data is not explicitly labeled by humans, robust learning methods that use weakly labeled data should suppress the impact of noisy instances or automatically resolve the ambiguities in noisy labels. We propose a method for character identification in a TV-series. The proposed method uses automatically extracted labels by associating the faces with names in the transcripts. Such weakly labeled data often has erroneous labels resulting from errors in detecting a face and synchronization. Our approach achieves robustness to noisy labeling by utilizing several features. We construct track nodes from face and person tracks and utilize information from facial and clothing appearances. We discover the video structure for effective inference by constructing a minimum-distance spanning tree (MST) from the track nodes. Hence, track nodes of similar appearance become adjacent to each other and are likely to have the same identity. The non-local cost aggregation step thus serves as a noise suppression step to reliably recognize the identity of the characters in the video. Another type of weakly labeled data results from labeling ambiguities. In other words, a training sample can have more than one label, and typically one of the labels is the true label. For instance, a news photo is usually accompanied by the captions, and the names provided in the captions can be used as the candidate labels for the faces appearing in the photo. Learning an effective subject classifier from the ambiguously labeled data is called ambiguously labeled learning. We propose a matrix completion framework for predicting the actual labels from the ambiguously labeled instances, and a standard supervised classifier that subsequently learns from the disambiguated labels to classify new data. We generalize this matrix completion framework to handle the issue of labeling imbalance that avoids domination by dominant labels. Besides, an iterative candidate elimination step is integrated with the proposed approach to improve the ambiguity resolution. Recently, video-based face recognition techniques have received significant attention since faces in a video provide diverse exemplars for constructing a robust representation of the target (i.e., subject of interest). Nevertheless, the target face in the video is usually annotated with minimum human effort (i.e., a single bounding box in a video frame). Although face tracking techniques can be utilized to associate faces in a single video shot, it is ineffective for associating faces across multiple video shots. To fully utilize faces of a target in multiples-shot videos, we propose a target face association (TFA) method to obtain a set of images of the target face, and these associated images are then utilized to construct a robust representation of the target for improving the performance of video-based face recognition task. One of the most important applications of video-based face recognition is outdoor video surveillance using a camera network. Face recognition in outdoor environment is a challenging task due to illumination changes, pose variations, and occlusions. We present the taxonomy of camera networks and discuss several techniques for continuous tracking of faces acquired by an outdoor camera network as well as a face matching algorithm. Finally, we demonstrate the real-time video surveillance system using pan-tilt-zoom (PTZ) cameras to perform pedestrian tracking, localization, face detection, and face recognition

Digital Repository at the University of Maryland

Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

Author: Fernandez-Chaves David
Gonzalez-Jimenez Javier
Matez-Bandera Jose Luis
Monroy Javier
Petkov Nicolai
Ruiz-Sarmiento Jose Raul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen