44,406 research outputs found
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives
This paper tackles the problem of reading comprehension over long narratives
where documents easily span over thousands of tokens. We propose a curriculum
learning (CL) based Pointer-Generator framework for reading/sampling over large
documents, enabling diverse training of the neural model based on the notion of
alternating contextual difficulty. This can be interpreted as a form of domain
randomization and/or generative pretraining during training. To this end, the
usage of the Pointer-Generator softens the requirement of having the answer
within the context, enabling us to construct diverse training samples for
learning. Additionally, we propose a new Introspective Alignment Layer (IAL),
which reasons over decomposed alignments using block-based self-attention. We
evaluate our proposed method on the NarrativeQA reading comprehension
benchmark, achieving state-of-the-art performance, improving existing baselines
by relative improvement on BLEU-4 and relative improvement on
Rouge-L. Extensive ablations confirm the effectiveness of our proposed IAL and
CL components.Comment: Accepted to ACL 201
Video Storytelling: Textual Summaries for Events
Bridging vision and natural language is a longstanding goal in computer
vision and multimedia research. While earlier works focus on generating a
single-sentence description for visual content, recent works have studied
paragraph generation. In this work, we introduce the problem of video
storytelling, which aims at generating coherent and succinct stories for long
videos. Video storytelling introduces new challenges, mainly due to the
diversity of the story and the length and complexity of the video. We propose
novel methods to address the challenges. First, we propose a context-aware
framework for multimodal embedding learning, where we design a Residual
Bidirectional Recurrent Neural Network to leverage contextual information from
past and future. Second, we propose a Narrator model to discover the underlying
storyline. The Narrator is formulated as a reinforcement learning agent which
is trained by directly optimizing the textual metric of the generated story. We
evaluate our method on the Video Story dataset, a new dataset that we have
collected to enable the study. We compare our method with multiple
state-of-the-art baselines, and show that our method achieves better
performance, in terms of quantitative measures and user study.Comment: Published in IEEE Transactions on Multimedi
Recommended from our members
Understanding analogical reasoning : viewpoints from psychology and related disciplines
Analogy and metaphor have a long history of study in linguistics, education, philosophy and psychology. Consensus over what analogy is or how analogy functions in language and thought, however, has been elusive. This paper, the first in a two part series, examines these various research traditions, attempting to bring out major lines of agreement over the role of analogy in individual human experience. As well as being a general literature review which may be helpful for newcomers to the study of analogy, this paper attempts to extract from these literatures existing theories, models and concepts which may be interesting or useful for computational studies of analogical reasoning
Semantic user profiling techniques for personalised multimedia recommendation
Due to the explosion of news materials available through broadcast and other channels, there is an increasing need for personalised news video retrieval. In this work, we introduce a semantic-based user modelling technique to capture users’ evolving information needs. Our approach exploits implicit user interaction to capture long-term user interests in a profile. The organised interests are used to retrieve and recommend news stories to the users. In this paper, we exploit the Linked Open Data Cloud to identify similar news stories that match the users’ interest. We evaluate various recommendation parameters by introducing a simulation-based evaluation scheme
Variational recurrent sequence-to-sequence retrieval for stepwise illustration
We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K image-text pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods
Multimedia search without visual analysis: the value of linguistic and contextual information
This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
- …