41,956 research outputs found
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Multimodal search-based dialogue is a challenging new task: It extends
visually grounded question answering systems into multi-turn conversations with
access to an external database. We address this new challenge by learning a
neural response generation system from the recently released Multimodal
Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded
multimodal conversational model where an encoded knowledge base (KB)
representation is appended to the decoder input. Our model substantially
outperforms strong baselines in terms of text-based similarity measures (over 9
BLEU points, 3 of which are solely due to the use of additional information
from the KB
Why People Search for Images using Web Search Engines
What are the intents or goals behind human interactions with image search
engines? Knowing why people search for images is of major concern to Web image
search engines because user satisfaction may vary as intent varies. Previous
analyses of image search behavior have mostly been query-based, focusing on
what images people search for, rather than intent-based, that is, why people
search for images. To date, there is no thorough investigation of how different
image search intents affect users' search behavior.
In this paper, we address the following questions: (1)Why do people search
for images in text-based Web image search systems? (2)How does image search
behavior change with user intent? (3)Can we predict user intent effectively
from interactions during the early stages of a search session? To this end, we
conduct both a lab-based user study and a commercial search log analysis.
We show that user intents in image search can be grouped into three classes:
Explore/Learn, Entertain, and Locate/Acquire. Our lab-based user study reveals
different user behavior patterns under these three intents, such as first click
time, query reformulation, dwell time and mouse movement on the result page.
Based on user interaction features during the early stages of an image search
session, that is, before mouse scroll, we develop an intent classifier that is
able to achieve promising results for classifying intents into our three intent
classes. Given that all features can be obtained online and unobtrusively, the
predicted intents can provide guidance for choosing ranking methods immediately
after scrolling
AMC: Attention guided Multi-modal Correlation Learning for Image Search
Given a user's query, traditional image search systems rank images according
to its relevance to a single modality (e.g., image content or surrounding
text). Nowadays, an increasing number of images on the Internet are available
with associated meta data in rich modalities (e.g., titles, keywords, tags,
etc.), which can be exploited for better similarity measure with queries. In
this paper, we leverage visual and textual modalities for image search by
learning their correlation with input query. According to the intent of query,
attention mechanism can be introduced to adaptively balance the importance of
different modalities. We propose a novel Attention guided Multi-modal
Correlation (AMC) learning method which consists of a jointly learned hierarchy
of intra and inter-attention networks. Conditioned on query's intent,
intra-attention networks (i.e., visual intra-attention network and language
intra-attention network) attend on informative parts within each modality; a
multi-modal inter-attention network promotes the importance of the most
query-relevant modalities. In experiments, we evaluate AMC models on the search
logs from two real world image search engines and show a significant boost on
the ranking of user-clicked images in search results. Additionally, we extend
AMC models to caption ranking task on COCO dataset and achieve competitive
results compared with recent state-of-the-arts.Comment: CVPR 201
LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
LiveSketch is a novel algorithm for searching large image collections using
hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch
search by creating visual suggestions that augment the query as it is drawn,
making query specification an iterative rather than one-shot process that helps
disambiguate users' search intent. Our technical contributions are: a triplet
convnet architecture that incorporates an RNN based variational autoencoder to
search for images using vector (stroke-based) queries; real-time clustering to
identify likely search intents (and so, targets within the search embedding);
and the use of backpropagation from those targets to perturb the input stroke
sequence, so suggesting alterations to the query in order to guide the search.
We show improvements in accuracy and time-to-task over contemporary baselines
using a 67M image corpus.Comment: Accepted to CVPR 201
Using Variable Dwell Time to Accelerate Gaze-Based Web Browsing with Two-Step Selection
In order to avoid the "Midas Touch" problem, gaze-based interfaces for
selection often introduce a dwell time: a fixed amount of time the user must
fixate upon an object before it is selected. Past interfaces have used a
uniform dwell time across all objects. Here, we propose a gaze-based browser
using a two-step selection policy with variable dwell time. In the first step,
a command, e.g. "back" or "select", is chosen from a menu using a dwell time
that is constant across the different commands. In the second step, if the
"select" command is chosen, the user selects a hyperlink using a dwell time
that varies between different hyperlinks. We assign shorter dwell times to more
likely hyperlinks and longer dwell times to less likely hyperlinks. In order to
infer the likelihood each hyperlink will be selected, we have developed a
probabilistic model of natural gaze behavior while surfing the web. We have
evaluated a number of heuristic and probabilistic methods for varying the dwell
times using both simulation and experiment. Our results demonstrate that
varying dwell time improves the user experience in comparison with fixed dwell
time, resulting in fewer errors and increased speed. While all of the methods
for varying dwell time resulted in improved performance, the probabilistic
models yielded much greater gains than the simple heuristics. The best
performing model reduces error rate by 50% compared to 100ms uniform dwell time
while maintaining a similar response time. It reduces response time by 60%
compared to 300ms uniform dwell time while maintaining a similar error rate.Comment: This is an Accepted Manuscript of an article published by Taylor &
Francis in the International Journal of Human-Computer Interaction on 30
March, 2018, available online:
http://www.tandfonline.com/10.1080/10447318.2018.1452351 . For an eprint of
the final published article, please access:
https://www.tandfonline.com/eprint/T9d4cNwwRUqXPPiZYm8Z/ful
- …