36 research outputs found

    GhostVLAD for set-based face recognition

    Full text link
    The objective of this paper is to learn a compact representation of image sets for template-based face recognition. We make the following contributions: first, we propose a network architecture which aggregates and embeds the face descriptors produced by deep convolutional neural networks into a compact fixed-length representation. This compact representation requires minimal memory storage and enables efficient similarity computation. Second, we propose a novel GhostVLAD layer that includes {\em ghost clusters}, that do not contribute to the aggregation. We show that a quality weighting on the input faces emerges automatically such that informative images contribute more than those with low quality, and that the ghost clusters enhance the network's ability to deal with poor quality images. Third, we explore how input feature dimension, number of clusters and different training techniques affect the recognition performance. Given this analysis, we train a network that far exceeds the state-of-the-art on the IJB-B face recognition dataset. This is currently one of the most challenging public benchmarks, and we surpass the state-of-the-art on both the identification and verification protocols.Comment: Accepted by ACCV 201

    Face Selection for Improving Set-to-Set Face Verification

    Get PDF
    V této práci navrhujeme statistický model, který predikuje výkonnost natrénovaného systému pro verifikaci tváří na základě analýzy kvality vstupních obrázků. Základní část navrhovaného modelu je konvoluční neuronová síť s názvem CNN-FQ, která klasifikuje vstupní obrazky tváří na nízko kvalitní anebo vysoce kvalitní. Pojem kvality není definován explicitně, ale učí se z chyb, které verifikační systém dělá při vyhodnocení trojic obličejů. Naučenou CNN-FQ jsme použili pro verifikaci identit popsaných sadou obrázků, abychom snížili negativní dopad nízko kvalitních fotografií při jejich agregaci do deskriptoru šablony. Při 1:1 Verifikaci s použitím IJB-B protokolu se ukázalo, že použití predikce kvality z CNN-FQ při agregaci šablony vede k vyšší přesnosti rozpoznávání v porovnání s dříve používanými metodami odhadu kvality obrázků tváře.In this thesis we propose a statistical model which predicts performance of a pre-trained face verification system based on analysing quality of input images. A core part of the proposed model is a convolutional neural network, named CNN-FQ, which marks the input facial image as low-quality or high-quaity one. The concept of quality is not defined explicitly, but instead it is learned from mistakes the verification system makes when ranking triplets of faces. We applied the CNN-FQ in a set-based face verification to down-weight negative impact of low-quality faces when aggregating them to a template descriptor. It is shown on IJB-B 1:1 Face Verification benchmark that using CNN-FQ quality predictor for template aggregation leads to consistently higher recognition accuracy if compared to previously used face quality scores

    Utterance-level Aggregation For Speaker Recognition In The Wild

    Full text link
    The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a "thin-ResNet" trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1 test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance length on performance, and conclude that for "in the wild" data, a longer length is beneficial.Comment: To appear in: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019. (Oral Presentation
    corecore