52 research outputs found
Evaluation of Automatic Video Captioning Using Direct Assessment
We present Direct Assessment, a method for manually assessing the quality of
automatically-generated captions for video. Evaluating the accuracy of video
captions is particularly difficult because for any given video clip there is no
definitive ground truth or correct answer against which to measure. Automatic
metrics for comparing automatic video captions against a manual caption such as
BLEU and METEOR, drawn from techniques used in evaluating machine translation,
were used in the TRECVid video captioning task in 2016 but these are shown to
have weaknesses. The work presented here brings human assessment into the
evaluation by crowdsourcing how well a caption describes a video. We
automatically degrade the quality of some sample captions which are assessed
manually and from this we are able to rate the quality of the human assessors,
a factor we take into account in the evaluation. Using data from the TRECVid
video-to-text task in 2016, we show how our direct assessment method is
replicable and robust and should scale to where there many caption-generation
techniques to be evaluated.Comment: 26 pages, 8 figure
TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search
International audienc
TRECVID 2014 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics
International audienceThe TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last dozen years this effort has yielded a better under- standing of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST with support from other US government agencies. Many organizations and individuals worldwide contribute significant time and effort
Review of Person Re-identification Techniques
Person re-identification across different surveillance cameras with disjoint
fields of view has become one of the most interesting and challenging subjects
in the area of intelligent video surveillance. Although several methods have
been developed and proposed, certain limitations and unresolved issues remain.
In all of the existing re-identification approaches, feature vectors are
extracted from segmented still images or video frames. Different similarity or
dissimilarity measures have been applied to these vectors. Some methods have
used simple constant metrics, whereas others have utilised models to obtain
optimised metrics. Some have created models based on local colour or texture
information, and others have built models based on the gait of people. In
general, the main objective of all these approaches is to achieve a
higher-accuracy rate and lowercomputational costs. This study summarises
several developments in recent literature and discusses the various available
methods used in person re-identification. Specifically, their advantages and
disadvantages are mentioned and compared.Comment: Published 201
An Interactive Lifelog Search Engine for LSC2018
This thesis consists on developing an interactive lifelog search engine for the LSC 2018 search challenge at ACM ICMR 2018. This search engine is created in order to browse for images from a given lifelog dataset and display them along with some written information related to them and four other images providing contextualization about the searched one. First of all, the work makes an introduction to the relevance of this project. It introduces the reader to the main social problems affronted and the aim of our project to deal with them. Thus, go ahead with the scope of the project introducing to the main objectives fixed. Also, the work is gone by the actual state of the same kind of prototypes that already exist to let the reader see the differences that our project presents. After the project approach is done, it begins a travel trough the methodology and creation process, going deep in the main aspects and the explanation of every election and decision, also remarking the limits of the current prototype. Additionally, the project concludes with a result section where the system is tested with six users. They are asked to find three specific images using the search engine. This test is divided in two sections: first, a qualitative section where the user is asked to test the system and fill out a survey to see how comfortable it is for him. And a second section, more quantitative, where they value the speed of our system. Finally, the project concludes going through the actual and future ethics of lifelogging in general and with a final conclusion further investigation and future improvemen
An audio-visual approach to web video categorization
International audienceIn this paper we address the issue of automatic video genre categorization of web media using an audio-visual approach. To this end, we propose content descriptors which exploit audio, temporal structure and color information. The potential of our descriptors is experimentally validated both from the perspective of a classification system and as an information retrieval approach. Validation is carried out on a real scenario, namely on more than 288 hours of video footage and 26 video genres specific to blip.tv media platform. Additionally, to reduce semantic gap, we propose a new relevance feedback technique which is based on hierarchical clustering. Experimental tests prove that retrieval performance can be significantly increased in this case, becoming comparable to the one obtained with high level semantic textual descriptors
- …