11 research outputs found
Weakly Supervised Localization using Deep Feature Maps
Object localization is an important computer vision problem with a variety of
applications. The lack of large scale object-level annotations and the relative
abundance of image-level labels makes a compelling case for weak supervision in
the object localization task. Deep Convolutional Neural Networks are a class of
state-of-the-art methods for the related problem of object recognition. In this
paper, we describe a novel object localization algorithm which uses
classification networks trained on only image labels. This weakly supervised
method leverages local spatial and semantic patterns captured in the
convolutional layers of classification networks. We propose an efficient beam
search based approach to detect and localize multiple objects in images. The
proposed method significantly outperforms the state-of-the-art in standard
object localization data-sets with a 8 point increase in mAP scores
Gaze Embeddings for Zero-Shot Image Classification
Zero-shot image classification using auxiliary information, such as
attributes describing discriminative object properties, requires time-consuming
annotation by domain experts. We instead propose a method that relies on human
gaze as auxiliary information, exploiting that even non-expert users have a
natural ability to judge class membership. We present a data collection
paradigm that involves a discrimination task to increase the information
content obtained from gaze data. Our method extracts discriminative descriptors
from the data and learns a compatibility function between image and gaze using
three novel gaze embeddings: Gaze Histograms (GH), Gaze Features with Grid
(GFG) and Gaze Features with Sequence (GFS). We introduce two new
gaze-annotated datasets for fine-grained image classification and show that
human gaze data is indeed class discriminative, provides a competitive
alternative to expert-annotated attributes, and outperforms other baselines for
zero-shot image classification
Attention Allocation Aid for Visual Search
This paper outlines the development and testing of a novel, feedback-enabled
attention allocation aid (AAAD), which uses real-time physiological data to
improve human performance in a realistic sequential visual search task. Indeed,
by optimizing over search duration, the aid improves efficiency, while
preserving decision accuracy, as the operator identifies and classifies targets
within simulated aerial imagery. Specifically, using experimental eye-tracking
data and measurements about target detectability across the human visual field,
we develop functional models of detection accuracy as a function of search
time, number of eye movements, scan path, and image clutter. These models are
then used by the AAAD in conjunction with real time eye position data to make
probabilistic estimations of attained search accuracy and to recommend that the
observer either move on to the next image or continue exploring the present
image. An experimental evaluation in a scenario motivated from human
supervisory control in surveillance missions confirms the benefits of the AAAD.Comment: To be presented at the ACM CHI conference in Denver, Colorado in May
201
Iterative multi-path tracking for video and volume segmentation with sparse point supervision
Recent machine learning strategies for segmentation tasks have shown great
ability when trained on large pixel-wise annotated image datasets. It remains a
major challenge however to aggregate such datasets, as the time and monetary
cost associated with collecting extensive annotations is extremely high. This
is particularly the case for generating precise pixel-wise annotations in video
and volumetric image data. To this end, this work presents a novel framework to
produce pixel-wise segmentations using minimal supervision. Our method relies
on 2D point supervision, whereby a single 2D location within an object of
interest is provided on each image of the data. Our method then estimates the
object appearance in a semi-supervised fashion by learning
object-image-specific features and by using these in a semi-supervised learning
framework. Our object model is then used in a graph-based optimization problem
that takes into account all provided locations and the image data in order to
infer the complete pixel-wise segmentation. In practice, we solve this
optimally as a tracking problem using a K-shortest path approach. Both the
object model and segmentation are then refined iteratively to further improve
the final segmentation. We show that by collecting 2D locations using a gaze
tracker, our approach can provide state-of-the-art segmentations on a range of
objects and image modalities (video and 3D volumes), and that these can then be
used to train supervised machine learning classifiers
Recommended from our members