1,170 research outputs found
Towards Making Random Passwords Memorable: Leveraging Users' Cognitive Ability Through Multiple Cues
Given the choice, users produce passwords reflecting common strategies and
patterns that ease recall but offer uncertain and often weak security.
System-assigned passwords provide measurable security but suffer from poor
memorability. To address this usability-security tension, we argue that systems
should assign random passwords but also help with memorization and recall. We
investigate the feasibility of this approach with CuedR, a novel
cued-recognition authentication scheme that provides users with multiple cues
(visual, verbal, and spatial) and lets them choose the cues that best fit their
learning process for later recognition of system-assigned keywords. In our lab
study, all 37 of our participants could log in within three attempts one week
after registration (mean login time: 38.0 seconds). A pilot study on using
multiple CuedR passwords also showed 100% recall within three attempts. Based
on our results, we suggest appropriate applications for CuedR, such as
financial and e-commerce accounts.Comment: Will appear at CHI 2015 Conference, to be held at Seoul, Kore
What makes an Image Iconic? A Fine-Grained Case Study
A natural approach to teaching a visual concept, e.g. a bird species, is to
show relevant images. However, not all relevant images represent a concept
equally well. In other words, they are not necessarily iconic. This observation
raises three questions. Is iconicity a subjective property? If not, can we
predict iconicity? And what exactly makes an image iconic? We provide answers
to these questions through an extensive experimental study on a challenging
fine-grained dataset of birds. We first show that iconicity ratings are
consistent across individuals, even when they are not domain experts, thus
demonstrating that iconicity is not purely subjective. We then consider an
exhaustive list of properties that are intuitively related to iconicity and
measure their correlation with these iconicity ratings. We combine them to
predict iconicity of new unseen images. We also propose a direct iconicity
predictor that is discriminatively trained with iconicity ratings. By combining
both systems, we get an iconicity prediction that approaches human performance
Modeling Image Virality with Pairwise Spatial Transformer Networks
The study of virality and information diffusion online is a topic gaining
traction rapidly in the computational social sciences. Computer vision and
social network analysis research have also focused on understanding the impact
of content and information diffusion in making content viral, with prior
approaches not performing significantly well as other traditional
classification tasks. In this paper, we present a novel pairwise reformulation
of the virality prediction problem as an attribute prediction task and develop
a novel algorithm to model image virality on online media using a pairwise
neural network. Our model provides significant insights into the features that
are responsible for promoting virality and surpasses the existing
state-of-the-art by a 12% average improvement in prediction. We also
investigate the effect of external category supervision on relative attribute
prediction and observe an increase in prediction accuracy for the same across
several attribute learning datasets.Comment: 9 pages, Accepted as a full paper at the ACM Multimedia Conference
(MM) 201
Show and Recall: Learning What Makes Videos Memorable
With the explosion of video content on the Internet, there is a need for
research on methods for video analysis which take human cognition into account.
One such cognitive measure is memorability, or the ability to recall visual
content after watching it. Prior research has looked into image memorability
and shown that it is intrinsic to visual content, but the problem of modeling
video memorability has not been addressed sufficiently. In this work, we
develop a prediction model for video memorability, including complexities of
video content in it. Detailed feature analysis reveals that the proposed method
correlates well with existing findings on memorability. We also describe a
novel experiment of predicting video sub-shot memorability and show that our
approach improves over current memorability methods in this task. Experiments
on standard datasets demonstrate that the proposed metric can achieve results
on par or better than the state-of-the art methods for video summarization.Comment: 10 pages, updated abstract, added few references, project page link
and acknowledgements. Accepted at ICCV 2017 Workshop on Mutual Benefits of
Cognitive and Computer Vision (MBCC
Changing the Image Memorability: From Basic Photo Editing to GANs
Memorability is considered to be an important characteristic of visual
content, whereas for advertisement and educational purposes it is often
crucial. Despite numerous studies on understanding and predicting image
memorability, there are almost no achievements in memorability modification. In
this work, we study two approaches to image editing - GAN and classical image
processing - and show their impact on memorability. The visual features which
influence memorability directly stay unknown till now, hence it is impossible
to control it manually. As a solution, we let GAN learn it deeply using labeled
data, and then use it for conditional generation of new images. By analogy with
algorithms which edit facial attributes, we consider memorability as yet
another attribute and operate with it in the same way. Obtained data is also
interesting for analysis, simply because there are no real-world examples of
successful change of image memorability while preserving its other attributes.
We believe this may give many new answers to the question "what makes an image
memorable?" Apart from that we also study the influence of conventional
photo-editing tools (Photoshop, Instagram, etc.) used daily by a wide audience
on memorability. In this case, we start from real practical methods and study
it using statistics and recent advances in memorability prediction.
Photographers, designers, and advertisers will benefit from the results of this
study directly.Comment: Accepted to CVPR 2019 Workshop (MBCCV
Design Guidelines for Landmarks to Support Navigation in Virtual Environments
Unfamiliar, large-scale virtual environments are difficult to navigate. This
paper presents design guidelines to ease navigation in such virtual
environments. The guidelines presented here focus on the design and placement
of landmarks in virtual environments. Moreover, the guidelines are based
primarily on the extensive empirical literature on navigation in the real
world. A rationale for this approach is provided by the similarities between
navigational behavior in real and virtual environments.Comment: 9 pages, 1 figur
Rapid Probabilistic Interest Learning from Domain-Specific Pairwise Image Comparisons
A great deal of work aims to discover large general purpose models of image
interest or memorability for visual search and information retrieval. This
paper argues that image interest is often domain and user specific, and that
efficient mechanisms for learning about this domain-specific image interest as
quickly as possible, while limiting the amount of data-labelling required, are
often more useful to end-users. This work uses pairwise image comparisons to
reduce the labelling burden on these users, and introduces an image interest
estimation approach that performs similarly to recent data hungry deep learning
approaches trained using pairwise ranking losses. Here, we use a Gaussian
process model to interpolate image interest inferred using a Bayesian ranking
approach over image features extracted using a pre-trained convolutional neural
network. Results show that fitting a Gaussian process in high-dimensional image
feature space is not only computationally feasible, but also effective across a
broad range of domains. The proposed probabilistic interest estimation approach
produces image interests paired with uncertainties that can be used to identify
images for which additional labelling is required and measure inference
convergence, allowing for sample efficient active model training. Importantly,
the probabilistic formulation allows for effective visual search and
information retrieval when limited labelling data is available
Maps of Visual Importance
The importance of an element in a visual stimulus is commonly associated with
the fixations during a free-viewing task. We argue that fixations are not
always correlated with attention or awareness of visual objects. We suggest to
filter the fixations recorded during exploration of the image based on the
fixations recorded during recalling the image against a neutral background.
This idea exploits that eye movements are a spatial index into the memory of a
visual stimulus. We perform an experiment in which we record the eye movements
of 30 observers during the presentation and recollection of 100 images. The
locations of fixations during recall are only qualitatively related to the
fixations during exploration. We develop a deformation mapping technique to
align the fixations from recall with the fixation during exploration. This
allows filtering the fixations based on proximity and a threshold on proximity
provides a convenient slider to control the amount of filtering. Analyzing the
spatial histograms resulting from the filtering procedure as well as the set of
removed fixations shows that certain types of scene elements, which could be
considered irrelevant, are removed. In this sense, they provide a measure of
importance of visual elements for human observers.Comment: 42 pages, 19 figure
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
Images and text in advertisements interact in complex, non-literal ways. The
two channels are usually complementary, with each channel telling a different
part of the story. Current approaches, such as image captioning methods, only
examine literal, redundant relationships, where image and text show exactly the
same content. To understand more complex relationships, we first collect a
dataset of advertisement interpretations for whether the image and slogan in
the same visual advertisement form a parallel (conveying the same message
without literally saying the same thing) or non-parallel relationship, with the
help of workers recruited on Amazon Mechanical Turk. We develop a variety of
features that capture the creativity of images and the specificity or ambiguity
of text, as well as methods that analyze the semantics within and across
channels. We show that our method outperforms standard image-text alignment
approaches on predicting the parallel/non-parallel relationship between image
and text.Comment: To appear in BMVC201
Real-time Burst Photo Selection Using a Light-Head Adversarial Network
We present an automatic moment capture system that runs in real-time on
mobile cameras. The system is designed to run in the viewfinder mode and
capture a burst sequence of frames before and after the shutter is pressed. For
each frame, the system predicts in real-time a "goodness" score, based on which
the best moment in the burst can be selected immediately after the shutter is
released, without any user interference. To solve the problem, we develop a
highly efficient deep neural network ranking model, which implicitly learns a
"latent relative attribute" space to capture subtle visual differences within a
sequence of burst images. Then the overall goodness is computed as a linear
aggregation of the goodnesses of all the latent attributes. The latent relative
attributes and the aggregation function can be seamlessly integrated in one
fully convolutional network and trained in an end-to-end fashion. To obtain a
compact model which can run on mobile devices in real-time, we have explored
and evaluated a wide range of network design choices, taking into account the
constraints of model size, computational cost, and accuracy. Extensive studies
show that the best frame predicted by our model hit users' top-1 (out of 11 on
average) choice for cases and top-3 choices for cases.
Moreover, the model(only 0.47M Bytes) can run in real time on mobile devices,
e.g. only 13ms on iPhone 7 for one frame prediction
- …