91,890 research outputs found
An Attentive Survey of Attention Models
Attention Model has now become an important concept in neural networks that
has been researched within diverse application domains. This survey provides a
structured and comprehensive overview of the developments in modeling
attention. In particular, we propose a taxonomy which groups existing
techniques into coherent categories. We review salient neural architectures in
which attention has been incorporated, and discuss applications in which
modeling attention has shown a significant impact. We also describe how
attention has been used to improve the interpretability of neural networks.
Finally, we discuss some future research directions in attention. We hope this
survey will provide a succinct introduction to attention models and guide
practitioners while developing approaches for their applications.Comment: accepted to Transactions on Intelligent Systems and Technology(TIST);
33 page
A survey on Information Visualization in light of Vision and Cognitive sciences
Information Visualization techniques are built on a context with many factors
related to both vision and cognition, making it difficult to draw a clear
picture of how data visually turns into comprehension. In the intent of
promoting a better picture, here, we survey concepts on vision, cognition, and
Information Visualization organized in a theorization named Visual Expression
Process. Our theorization organizes the basis of visualization techniques with
a reduced level of complexity; still, it is complete enough to foster
discussions related to design and analytical tasks. Our work introduces the
following contributions: (1) a Theoretical compilation of vision, cognition,
and Information Visualization; (2) Discussions supported by vast literature;
and (3) Reflections on visual-cognitive aspects concerning use and design. We
expect our contributions will provide further clarification about how users and
designers think about InfoVis, leveraging the potential of systems and
techniques.Comment: 29 pages, Elsevier Journal preprin
A Hierarchical Attention Model for Social Contextual Image Recommendation
Image based social networks are among the most popular social networking
services in recent years. With tremendous images uploaded everyday,
understanding users' preferences on user-generated images and making
recommendations have become an urgent need. In fact, many hybrid models have
been proposed to fuse various kinds of side information~(e.g., image visual
representation, social network) and user-item historical behavior for enhancing
recommendation performance. However, due to the unique characteristics of the
user generated images in social image platforms, the previous studies failed to
capture the complex aspects that influence users' preferences in a unified
framework. Moreover, most of these hybrid models relied on predefined weights
in combining different kinds of information, which usually resulted in
sub-optimal recommendation performance. To this end, in this paper, we develop
a hierarchical attention model for social contextual image recommendation. In
addition to basic latent user interest modeling in the popular matrix
factorization based recommendation, we identify three key aspects (i.e., upload
history, social influence, and owner admiration) that affect each user's latent
preferences, where each aspect summarizes a contextual factor from the complex
relationships between users and images. After that, we design a hierarchical
attention network that naturally mirrors the hierarchical relationship
(elements in each aspects level, and the aspect level) of users' latent
interests with the identified key aspects. Specifically, by taking embeddings
from state-of-the-art deep learning models that are tailored for each kind of
data, the hierarchical attention network could learn to attend differently to
more or less content. Finally, extensive experimental results on real-world
datasets clearly show the superiority of our proposed model
Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey
Text-based Question Answering (QA) is a challenging task which aims at
finding short concrete answers for users' questions. This line of research has
been widely studied with information retrieval techniques and has received
increasing attention in recent years by considering deep neural network
approaches. Deep learning approaches, which are the main focus of this paper,
provide a powerful technique to learn multiple layers of representations and
interaction between questions and texts. In this paper, we provide a
comprehensive overview of different models proposed for the QA task, including
both traditional information retrieval perspective, and more recent deep neural
network perspective. We also introduce well-known datasets for the task and
present available results from the literature to have a comparison between
different techniques
Improving Noise Robustness In Speaker Identification Using A Two-Stage Attention Model
While the use of deep neural networks has significantly boosted speaker
recognition performance, it is still challenging to separate speakers in poor
acoustic environments. To improve robustness of speaker recognition system
performance in noise, a novel two-stage attention mechanism which can be used
in existing architectures such as Time Delay Neural Networks (TDNNs) and
Convolutional Neural Networks (CNNs) is proposed. Noise is known to often mask
important information in both time and frequency domain. The proposed mechanism
allows the models to concentrate on reliable time/frequency components of the
signal. The proposed approach is evaluated using the Voxceleb1 dataset, which
aims at assessment of speaker recognition in real world situations. In addition
three types of noise at different signal-noise-ratios (SNRs) were added for
this work. The proposed mechanism is compared with three strong baselines:
X-vectors, Attentive X-vector, and Resnet-34. Results on both identification
and verification tasks show that the two-stage attention mechanism consistently
improves upon these for all noise conditions.Comment: Submitted to Interspeech202
Attentive Recurrent Comparators
Rapid learning requires flexible representations to quickly adopt to new
evidence. We develop a novel class of models called Attentive Recurrent
Comparators (ARCs) that form representations of objects by cycling through them
and making observations. Using the representations extracted by ARCs, we
develop a way of approximating a \textit{dynamic representation space} and use
it for one-shot learning. In the task of one-shot classification on the
Omniglot dataset, we achieve the state of the art performance with an error
rate of 1.5\%. This represents the first super-human result achieved for this
task with a generic model that uses only pixel information
"The Boating Store Had Its Best Sail Ever": Pronunciation-attentive Contextualized Pun Recognition
Humor plays an important role in human languages and it is essential to model
humor when building intelligence systems. Among different forms of humor, puns
perform wordplay for humorous effects by employing words with double entendre
and high phonetic similarity. However, identifying and modeling puns are
challenging as puns usually involved implicit semantic or phonological tricks.
In this paper, we propose Pronunciation-attentive Contextualized Pun
Recognition (PCPR) to perceive human humor, detect if a sentence contains puns
and locate them in the sentence. PCPR derives contextualized representation for
each word in a sentence by capturing the association between the surrounding
context and its corresponding phonetic symbols. Extensive experiments are
conducted on two benchmark datasets. Results demonstrate that the proposed
approach significantly outperforms the state-of-the-art methods in pun
detection and location tasks. In-depth analyses verify the effectiveness and
robustness of PCPR.Comment: 10 pages, 4 figures, 7 tables, accepted by ACL 202
Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms
Idioms pose problems to almost all Machine Translation systems. This type of
language is very frequent in day-to-day language use and cannot be simply
ignored. The recent interest in memory augmented models in the field of
Language Modelling has aided the systems to achieve good results by bridging
long-distance dependencies. In this paper we explore the use of such techniques
into a Neural Machine Translation system to help in translation of idiomatic
language
Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Mobility in an effective and socially-compliant manner is an essential yet
challenging task for robots operating in crowded spaces. Recent works have
shown the power of deep reinforcement learning techniques to learn socially
cooperative policies. However, their cooperation ability deteriorates as the
crowd grows since they typically relax the problem as a one-way Human-Robot
interaction problem. In this work, we want to go beyond first-order Human-Robot
interaction and more explicitly model Crowd-Robot Interaction (CRI). We propose
to (i) rethink pairwise interactions with a self-attention mechanism, and (ii)
jointly model Human-Robot as well as Human-Human interactions in the deep
reinforcement learning framework. Our model captures the Human-Human
interactions occurring in dense crowds that indirectly affects the robot's
anticipation capability. Our proposed attentive pooling mechanism learns the
collective importance of neighboring humans with respect to their future
states. Various experiments demonstrate that our model can anticipate human
dynamics and navigate in crowds with time efficiency, outperforming
state-of-the-art methods.Comment: Accepted at ICRA2019. Copyright may be transferred without notice,
after which this version may no longer be accessibl
Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence
The rapid growth of Location-based Social Networks (LBSNs) provides a great
opportunity to satisfy the strong demand for personalized Point-of-Interest
(POI) recommendation services. However, with the tremendous increase of users
and POIs, POI recommender systems still face several challenging problems: (1)
the hardness of modeling non-linear user-POI interactions from implicit
feedback; (2) the difficulty of incorporating context information such as POIs'
geographical coordinates. To cope with these challenges, we propose a novel
autoencoder-based model to learn the non-linear user-POI relations, namely
\textit{SAE-NAD}, which consists of a self-attentive encoder (SAE) and a
neighbor-aware decoder (NAD). In particular, unlike previous works equally
treat users' checked-in POIs, our self-attentive encoder adaptively
differentiates the user preference degrees in multiple aspects, by adopting a
multi-dimensional attention mechanism. To incorporate the geographical context
information, we propose a neighbor-aware decoder to make users' reachability
higher on the similar and nearby neighbors of checked-in POIs, which is
achieved by the inner product of POI embeddings together with the radial basis
function (RBF) kernel. To evaluate the proposed model, we conduct extensive
experiments on three real-world datasets with many state-of-the-art baseline
methods and evaluation metrics. The experimental results demonstrate the
effectiveness of our model.Comment: Accepted by the 27th ACM International Conference on Information and
Knowledge Management (CIKM 2018
- …