13,915 research outputs found
Location Embedding and Deep Convolutional Neural Networks for Next Location Prediction
International audienc
Location Embedding and Deep Convolutional Neural Networks for Next Location Prediction
International audienc
Cross-Domain Image Retrieval with Attention Modeling
With the proliferation of e-commerce websites and the ubiquitousness of smart
phones, cross-domain image retrieval using images taken by smart phones as
queries to search products on e-commerce websites is emerging as a popular
application. One challenge of this task is to locate the attention of both the
query and database images. In particular, database images, e.g. of fashion
products, on e-commerce websites are typically displayed with other
accessories, and the images taken by users contain noisy background and large
variations in orientation and lighting. Consequently, their attention is
difficult to locate. In this paper, we exploit the rich tag information
available on the e-commerce websites to locate the attention of database
images. For query images, we use each candidate image in the database as the
context to locate the query attention. Novel deep convolutional neural network
architectures, namely TagYNet and CtxYNet, are proposed to learn the attention
weights and then extract effective representations of the images. Experimental
results on public datasets confirm that our approaches have significant
improvement over the existing methods in terms of the retrieval accuracy and
efficiency.Comment: 8 pages with an extra reference pag
Straight to Shapes: Real-time Detection of Encoded Shapes
Current object detection approaches predict bounding boxes, but these provide
little instance-specific information beyond location, scale and aspect ratio.
In this work, we propose to directly regress to objects' shapes in addition to
their bounding boxes and categories. It is crucial to find an appropriate shape
representation that is compact and decodable, and in which objects can be
compared for higher-order concepts such as view similarity, pose variation and
occlusion. To achieve this, we use a denoising convolutional auto-encoder to
establish an embedding space, and place the decoder after a fast end-to-end
network trained to regress directly to the encoded shape vectors. This yields
what to the best of our knowledge is the first real-time shape prediction
network, running at ~35 FPS on a high-end desktop. With higher-order shape
reasoning well-integrated into the network pipeline, the network shows the
useful practical quality of generalising to unseen categories similar to the
ones in the training set, something that most existing approaches fail to
handle.Comment: 16 pages including appendix; Published at CVPR 201
ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
We propose a novel attention based deep learning architecture for visual
question answering task (VQA). Given an image and an image related natural
language question, VQA generates the natural language answer for the question.
Generating the correct answers requires the model's attention to focus on the
regions corresponding to the question, because different questions inquire
about the attributes of different image regions. We introduce an attention
based configurable convolutional neural network (ABC-CNN) to learn such
question-guided attention. ABC-CNN determines an attention map for an
image-question pair by convolving the image feature map with configurable
convolutional kernels derived from the question's semantics. We evaluate the
ABC-CNN architecture on three benchmark VQA datasets: Toronto COCO-QA, DAQUAR,
and VQA dataset. ABC-CNN model achieves significant improvements over
state-of-the-art methods on these datasets. The question-guided attention
generated by ABC-CNN is also shown to reflect the regions that are highly
relevant to the questions
- …