5 research outputs found
MENTOR: Human Perception-Guided Pretraining for Iris Presentation Detection
Incorporating human salience into the training of CNNs has boosted
performance in difficult tasks such as biometric presentation attack detection.
However, collecting human annotations is a laborious task, not to mention the
questions of how and where (in the model architecture) to efficiently
incorporate this information into model's training once annotations are
obtained. In this paper, we introduce MENTOR (huMan pErceptioN-guided
preTraining fOr iris pResentation attack detection), which addresses both of
these issues through two unique rounds of training. First, we train an
autoencoder to learn human saliency maps given an input iris image (both real
and fake examples). Once this representation is learned, we utilize the trained
autoencoder in two different ways: (a) as a pre-trained backbone for an iris
presentation attack detector, and (b) as a human-inspired annotator of salient
features on unknown data. We show that MENTOR's benefits are threefold: (a)
significant boost in iris PAD performance when using the human
perception-trained encoder's weights compared to general-purpose weights (e.g.
ImageNet-sourced, or random), (b) capability of generating infinite number of
human-like saliency maps for unseen iris PAD samples to be used in any human
saliency-guided training paradigm, and (c) increase in efficiency of iris PAD
model training. Sources codes and weights are offered along with the paper.Comment: 8 pages, 3 figure
Teaching AI to Teach: Leveraging Limited Human Salience Data Into Unlimited Saliency-Based Training
Machine learning models have shown increased accuracy in classification tasks
when the training process incorporates human perceptual information. However, a
challenge in training human-guided models is the cost associated with
collecting image annotations for human salience. Collecting annotation data for
all images in a large training set can be prohibitively expensive. In this
work, we utilize ''teacher'' models (trained on a small amount of
human-annotated data) to annotate additional data by means of teacher models'
saliency maps. Then, ''student'' models are trained using the larger amount of
annotated training data. This approach makes it possible to supplement a
limited number of human-supplied annotations with an arbitrarily large number
of model-generated image annotations. We compare the accuracy achieved by our
teacher-student training paradigm with (1) training using all available human
salience annotations, and (2) using all available training data without human
salience annotations. We use synthetic face detection and fake iris detection
as example challenging problems, and report results across four model
architectures (DenseNet, ResNet, Xception, and Inception), and two saliency
estimation methods (CAM and RISE). Results show that our teacher-student
training paradigm results in models that significantly exceed the performance
of both baselines, demonstrating that our approach can usefully leverage a
small amount of human annotations to generate salience maps for an arbitrary
amount of additional training data.Comment: 12 pages, 8 figure