30,763 research outputs found
Story Cloze Ending Selection Baselines and Data Examination
This paper describes two supervised baseline systems for the Story Cloze Test
Shared Task (Mostafazadeh et al., 2016a). We first build a classifier using
features based on word embeddings and semantic similarity computation. We
further implement a neural LSTM system with different encoding strategies that
try to model the relation between the story and the provided endings. Our
experiments show that a model using representation features based on average
word embedding vectors over the given story words and the candidate ending
sentences words, joint with similarity features between the story and candidate
ending representations performed better than the neural models. Our best model
achieves an accuracy of 72.42, ranking 3rd in the official evaluation.Comment: Submission for the LSDSem 2017 - Linking Models of Lexical,
Sentential and Discourse-level Semantics - Shared Tas
Teaching Machines to Read and Comprehend
Teaching machines to read natural language documents remains an elusive
challenge. Machine reading systems can be tested on their ability to answer
questions posed on the contents of documents that they have seen, but until now
large scale training and test datasets have been missing for this type of
evaluation. In this work we define a new methodology that resolves this
bottleneck and provides large scale supervised reading comprehension data. This
allows us to develop a class of attention based deep neural networks that learn
to read real documents and answer complex questions with minimal prior
knowledge of language structure.Comment: Appears in: Advances in Neural Information Processing Systems 28
(NIPS 2015). 14 pages, 13 figure
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
Learning to Hash-tag Videos with Tag2Vec
User-given tags or labels are valuable resources for semantic understanding
of visual media such as images and videos. Recently, a new type of labeling
mechanism known as hash-tags have become increasingly popular on social media
sites. In this paper, we study the problem of generating relevant and useful
hash-tags for short video clips. Traditional data-driven approaches for tag
enrichment and recommendation use direct visual similarity for label transfer
and propagation. We attempt to learn a direct low-cost mapping from video to
hash-tags using a two step training process. We first employ a natural language
processing (NLP) technique, skip-gram models with neural network training to
learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a
corpus of 10 million hash-tags. We then train an embedding function to map
video features to the low-dimensional Tag2vec space. We learn this embedding
for 29 categories of short video clips with hash-tags. A query video without
any tag-information can then be directly mapped to the vector space of tags
using the learned embedding and relevant tags can be found by performing a
simple nearest-neighbor retrieval in the Tag2Vec space. We validate the
relevance of the tags suggested by our system qualitatively and quantitatively
with a user study
- …