73 research outputs found
Deep Sketch-Photo Face Recognition Assisted by Facial Attributes
In this paper, we present a deep coupled framework to address the problem of
matching sketch image against a gallery of mugshots. Face sketches have the
essential in- formation about the spatial topology and geometric details of
faces while missing some important facial attributes such as ethnicity, hair,
eye, and skin color. We propose a cou- pled deep neural network architecture
which utilizes facial attributes in order to improve the sketch-photo
recognition performance. The proposed Attribute-Assisted Deep Con- volutional
Neural Network (AADCNN) method exploits the facial attributes and leverages the
loss functions from the facial attributes identification and face verification
tasks in order to learn rich discriminative features in a common em- bedding
subspace. The facial attribute identification task increases the inter-personal
variations by pushing apart the embedded features extracted from individuals
with differ- ent facial attributes, while the verification task reduces the
intra-personal variations by pulling together all the fea- tures that are
related to one person. The learned discrim- inative features can be well
generalized to new identities not seen in the training data. The proposed
architecture is able to make full use of the sketch and complementary fa- cial
attribute information to train a deep model compared to the conventional
sketch-photo recognition methods. Exten- sive experiments are performed on
composite (E-PRIP) and semi-forensic (IIIT-D semi-forensic) datasets. The
results show the superiority of our method compared to the state- of-the-art
models in sketch-photo recognition algorithm
Trading-off Mutual Information on Feature Aggregation for Face Recognition
Despite the advances in the field of Face Recognition (FR), the precision of
these methods is not yet sufficient. To improve the FR performance, this paper
proposes a technique to aggregate the outputs of two state-of-the-art (SOTA)
deep FR models, namely ArcFace and AdaFace. In our approach, we leverage the
transformer attention mechanism to exploit the relationship between different
parts of two feature maps. By doing so, we aim to enhance the overall
discriminative power of the FR system. One of the challenges in feature
aggregation is the effective modeling of both local and global dependencies.
Conventional transformers are known for their ability to capture long-range
dependencies, but they often struggle with modeling local dependencies
accurately. To address this limitation, we augment the self-attention mechanism
to capture both local and global dependencies effectively. This allows our
model to take advantage of the overlapping receptive fields present in
corresponding locations of the feature maps. However, fusing two feature maps
from different FR models might introduce redundancies to the face embedding.
Since these models often share identical backbone architectures, the resulting
feature maps may contain overlapping information, which can mislead the
training process. To overcome this problem, we leverage the principle of
Information Bottleneck to obtain a maximally informative facial representation.
This ensures that the aggregated features retain the most relevant and
discriminative information while minimizing redundant or misleading details. To
evaluate the effectiveness of our proposed method, we conducted experiments on
popular benchmarks and compared our results with state-of-the-art algorithms.
The consistent improvement we observed in these benchmarks demonstrates the
efficacy of our approach in enhancing FR performance.Comment: Accepted to 22 IEEE International Conference on Machine
Learning and Applications 2023 (ICMLA
Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification
In this paper a novel cross-device text-independent speaker verification
architecture is proposed. Majority of the state-of-the-art deep architectures
that are used for speaker verification tasks consider Mel-frequency cepstral
coefficients. In contrast, our proposed Siamese convolutional neural network
architecture uses Mel-frequency spectrogram coefficients to benefit from the
dependency of the adjacent spectro-temporal features. Moreover, although
spectro-temporal features have proved to be highly reliable in speaker
verification models, they only represent some aspects of short-term acoustic
level traits of the speaker's voice. However, the human voice consists of
several linguistic levels such as acoustic, lexicon, prosody, and phonetics,
that can be utilized in speaker verification models. To compensate for these
inherited shortcomings in spectro-temporal features, we propose to enhance the
proposed Siamese convolutional neural network architecture by deploying a
multilayer perceptron network to incorporate the prosodic, jitter, and shimmer
features. The proposed end-to-end verification architecture performs feature
extraction and verification simultaneously. This proposed architecture displays
significant improvement over classical signal processing approaches and deep
algorithms for forensic cross-device speaker verification.Comment: Accepted in 9th IEEE International Conference on Biometrics: Theory,
Applications, and Systems (BTAS 2018
- …