836 research outputs found
Evaluation of Automatic Video Captioning Using Direct Assessment
We present Direct Assessment, a method for manually assessing the quality of
automatically-generated captions for video. Evaluating the accuracy of video
captions is particularly difficult because for any given video clip there is no
definitive ground truth or correct answer against which to measure. Automatic
metrics for comparing automatic video captions against a manual caption such as
BLEU and METEOR, drawn from techniques used in evaluating machine translation,
were used in the TRECVid video captioning task in 2016 but these are shown to
have weaknesses. The work presented here brings human assessment into the
evaluation by crowdsourcing how well a caption describes a video. We
automatically degrade the quality of some sample captions which are assessed
manually and from this we are able to rate the quality of the human assessors,
a factor we take into account in the evaluation. Using data from the TRECVid
video-to-text task in 2016, we show how our direct assessment method is
replicable and robust and should scale to where there many caption-generation
techniques to be evaluated.Comment: 26 pages, 8 figure
Portable extraction of partially structured facts from the web
A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, this partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for the utility of the extracted facts in enhancing image captions
A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source
code behavior -- is a rapidly-growing research topic with applications to
automatic documentation generation, program comprehension, and software
maintenance. Traditional techniques relied on heuristics and templates built
manually by human experts. Recently, data-driven approaches based on neural
machine translation have largely overtaken template-based systems. But nearly
all of these techniques rely almost entirely on programs having good internal
documentation; without clear identifier names, the models fail to create good
summaries. In this paper, we present a neural model that combines words from
code with code structure from an AST. Unlike previous approaches, our model
processes each data source as a separate input, which allows the model to learn
code structure independent of the text in code. This process helps our approach
provide coherent summaries in many cases even when zero internal documentation
is provided. We evaluate our technique with a dataset we created from 2.1m Java
methods. We find improvement over two baseline techniques from SE literature
and one from NLP literature
Self-critical Sequence Training for Image Captioning
Recently it has been shown that policy-gradient methods for reinforcement
learning can be utilized to train deep end-to-end systems directly on
non-differentiable metrics for the task at hand. In this paper we consider the
problem of optimizing image captioning systems using reinforcement learning,
and show that by carefully optimizing our systems using the test metrics of the
MSCOCO task, significant gains in performance can be realized. Our systems are
built using a new optimization approach that we call self-critical sequence
training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather
than estimating a "baseline" to normalize the rewards and reduce variance,
utilizes the output of its own test-time inference algorithm to normalize the
rewards it experiences. Using this approach, estimating the reward signal (as
actor-critic methods must do) and estimating normalization (as REINFORCE
algorithms typically do) is avoided, while at the same time harmonizing the
model with respect to its test-time inference procedure. Empirically we find
that directly optimizing the CIDEr metric with SCST and greedy decoding at
test-time is highly effective. Our results on the MSCOCO evaluation sever
establish a new state-of-the-art on the task, improving the best result in
terms of CIDEr from 104.9 to 114.7.Comment: CVPR 2017 + additional analysis + fixed baseline results, 16 page
- …