13,951 research outputs found
Collaborative Feature Learning from Social Media
Image feature representation plays an essential role in image recognition and
related tasks. The current state-of-the-art feature learning paradigm is
supervised learning from labeled data. However, this paradigm requires
large-scale category labels, which limits its applicability to domains where
labels are hard to obtain. In this paper, we propose a new data-driven feature
learning paradigm which does not rely on category labels. Instead, we learn
from user behavior data collected on social media. Concretely, we use the image
relationship discovered in the latent space from the user behavior data to
guide the image feature learning. We collect a large-scale image and user
behavior dataset from Behance.net. The dataset consists of 1.9 million images
and over 300 million view records from 1.9 million users. We validate our
feature learning paradigm on this dataset and find that the learned feature
significantly outperforms the state-of-the-art image features in learning
better image similarities. We also show that the learned feature performs
competitively on various recognition benchmarks
Story Ending Generation with Incremental Encoding and Commonsense Knowledge
Generating a reasonable ending for a given story context, i.e., story ending
generation, is a strong indication of story comprehension. This task requires
not only to understand the context clues which play an important role in
planning the plot but also to handle implicit knowledge to make a reasonable,
coherent story.
In this paper, we devise a novel model for story ending generation. The model
adopts an incremental encoding scheme to represent context clues which are
spanning in the story context. In addition, commonsense knowledge is applied
through multi-source attention to facilitate story comprehension, and thus to
help generate coherent and reasonable endings. Through building context clues
and using implicit knowledge, the model is able to produce reasonable story
endings. context clues implied in the post and make the inference based on it.
Automatic and manual evaluation shows that our model can generate more
reasonable story endings than state-of-the-art baselines.Comment: Accepted in AAAI201
Learning Disentangled Representations with Reference-Based Variational Autoencoders
Learning disentangled representations from visual data, where different
high-level generative factors are independently encoded, is of importance for
many computer vision tasks. Solving this problem, however, typically requires
to explicitly label all the factors of interest in training images. To
alleviate the annotation cost, we introduce a learning setting which we refer
to as "reference-based disentangling". Given a pool of unlabeled images, the
goal is to learn a representation where a set of target factors are
disentangled from others. The only supervision comes from an auxiliary
"reference set" containing images where the factors of interest are constant.
In order to address this problem, we propose reference-based variational
autoencoders, a novel deep generative model designed to exploit the
weak-supervision provided by the reference set. By addressing tasks such as
feature learning, conditional image generation or attribute transfer, we
validate the ability of the proposed model to learn disentangled
representations from this minimal form of supervision
Deformable Shape Completion with Graph Convolutional Autoencoders
The availability of affordable and portable depth sensors has made scanning
objects and people simpler than ever. However, dealing with occlusions and
missing parts is still a significant challenge. The problem of reconstructing a
(possibly non-rigidly moving) 3D object from a single or multiple partial scans
has received increasing attention in recent years. In this work, we propose a
novel learning-based method for the completion of partial shapes. Unlike the
majority of existing approaches, our method focuses on objects that can undergo
non-rigid deformations. The core of our method is a variational autoencoder
with graph convolutional operations that learns a latent space for complete
realistic shapes. At inference, we optimize to find the representation in this
latent space that best fits the generated shape to the known partial input. The
completed shape exhibits a realistic appearance on the unknown part. We show
promising results towards the completion of synthetic and real scans of human
body and face meshes exhibiting different styles of articulation and
partiality.Comment: CVPR 201
- …