Search CORE

38 research outputs found

The AXES submissions at TrecVid 2013

Author: Aly Robin
Arandjelovic Relja
Chatfield Ken
Douze Matthijs
Fernando Basura
Harchaoui Zaid
McGuinness Kevin
O'Connor Noel E.
Oneata Dan
Parkhi Omkar M.
Potapov Danila
Revaud Jérôme
Schmid Cordelia
Schwenninger Jochen
Scott David
Tuytelaars Tinne
Verbeek Jakob
Wang Heng
Zisserman Andrew
Publication venue
Publication date: 01/11/2013
Field of study

The AXES project participated in the interactive instance search task (INS), the semantic indexing task (SIN) the multimedia event recounting task (MER), and the multimedia event detection task (MED) for TRECVid 2013. Our interactive INS focused this year on using classifiers trained at query time with positive examples collected from external search engines. Participants in our INS experiments were carried out by students and researchers at Dublin City University. Our best INS runs performed on par with the top ranked INS runs in terms of P@10 and P@30, and around the median in terms of mAP. For SIN, MED and MER, we use systems based on state- of-the-art local low-level descriptors for motion, image, and sound, as well as high-level features to capture speech and text and the visual and audio stream respectively. The low-level descriptors were aggregated by means of Fisher vectors into high- dimensional video-level signatures, the high-level features are aggregated into bag-of-word histograms. Using these features we train linear classifiers, and use early and late-fusion to combine the different features. Our MED system achieved the best score of all submitted runs in the main track, as well as in the ad-hoc track. This paper describes in detail our INS, MER, and MED systems and the results and findings of our experimen

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Irish Universities

DCU Online Research Access Service

The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection

Author: Alahari Karteek
Chesneau Nicolas
Douze Matthijs
Gauvain Jean-Luc
Harchaoui Zaid
Lamel Lori
Leray Clément
Oneata Dan
Paulin Mattis
Potapov Danila
Schmid Cordelia
Schmidt Christoph Andreas
Verbeek Jakob
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

-This paper describes our participation to the 2014 edition of the TrecVid Multimedia Event Detection task. Our system is based on a collection of local visual and audio descriptors, which are aggregated to global descriptors, one for each type of low-level descriptor, using Fisher vectors. Besides these features, we use two features based on convolutional networks: one for the visual channel, and one for the audio channel. Additional high-level featuresare extracted using ASR and OCR features. Finally, we used mid-level attribute features based on object and action detectors trained on external datasets. Our two submissions (INRIA-LIM-VocR and AXES) are identical interms of all the components, except for the ASR system that is used. We present an overview of the features andthe classification techniques, and experimentally evaluate our system on TrecVid MED 2011 data

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

TRECVID 2014 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics

Author: Awad George
Fiscus Jon
Joy David
Kraaij Wessel
Michel Martial
Over Paul
Quénot Georges
Sanders Greg
Smeaton Alan,
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceThe TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last dozen years this effort has yielded a better under- standing of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST with support from other US government agencies. Many organizations and individuals worldwide contribute significant time and effort

Hal - Université Grenoble Alpes

Beat-Event Detection in Action Movie Franchises

Author: Douze Matthijs
Harchaoui Zaid
Potapov Danila
Revaud Jerome
Schmid Cordelia
Publication venue
Publication date: 14/08/2015
Field of study

While important advances were recently made towards temporally localizing and recognizing specific human actions or activities in videos, efficient detection and classification of long video chunks belonging to semantically defined categories such as "pursuit" or "romance" remains challenging.We introduce a new dataset, Action Movie Franchises, consisting of a collection of Hollywood action movie franchises. We define 11 non-exclusive semantic categories - called beat-categories - that are broad enough to cover most of the movie footage. The corresponding beat-events are annotated as groups of video shots, possibly overlapping.We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots. We show that temporal constraints significantly improve the classification performance. We set up an evaluation protocol for beat-event localization as well as for shot classification, depending on whether movies from the same franchise are present or not in the training data

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

CNN Architectures for Large-Scale Audio Classification

Author: Chaudhuri Sourish
Ellis Daniel P. W.
Gemmeke Jort F.
Hershey Shawn
Jansen Aren
Moore R. Channing
Plakal Manoj
Platt Devin
Saurous Rif A.
Seybold Bryan
Slaney Malcolm
Weiss Ron J.
Wilson Kevin
Publication venue
Publication date: 10/01/2017
Field of study

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new addition

arXiv.org e-Print Archive

Crossref

TRECVID 2015 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics

Author: Aly Robin
Awad George
Fiscus Jon
Joy David
Kraaij Wessel
Michel Martial
Ordelman Roeland
Over Paul
Quénot Georges
Smeaton Alan,
Publication venue: HAL CCSD
Publication date: 16/11/2015
Field of study

International audienc

TNO at TRECVID 2013 : multimedia event detection and instance search

Author: Antwerpen Gert van
Azzopardi George
Baan Jan
Boer Maaike de
Bouma Henri
Brandt Paul
Broekhuijsen Jeroen
Daniele Laura
Eekeren Adam van
Eendebak Pieter T.
Haar Frank ter
Hollander Richard den
Hove Johan-Martijn ten
Huis Jasper van
Kraaij Wessel
Schavemaker John
Schutte Klamer
Spitters Martijn
TRECVID 2013
Versloot Corne
Wit Joost de
Zon Remco van der
Publication venue: TRECVID
Publication date: 01/01/2013
Field of study

We describe the TNO system and the evaluation results for TRECVID 2013 Multimedia Event Detection (MED) and instance search (INS) tasks. The MED system consists of a bag-of-word (BOW) approach with spatial tiling that uses low-level static and dynamic visual features, an audio feature and high-level concepts. Automatic speech recognition (ASR) and optical character recognition (OCR) are not used in the system. In the MED case with 100 example training videos, support-vector machines (SVM) are trained and fused to detect an event in the test set. In the case with 0 example videos, positive and negative concepts are extracted as keywords from the textual event description and events are detected with the high-level concepts. The MED results show that the SIFT keypoint descriptor is the one which contributes best to the results, fusion of multiple low-level features helps to improve the performance, and the textual event-description chain currently performs poorly. The TNO INS system presents a baseline open-source approach using standard SIFT keypoint detection and exhaustive matching. In order to speed up search times for queries a basic map-reduce scheme is presented to be used on a multi-node cluster. Our INS results show above-median results with acceptable search times.This research for the MED submission was performed in the GOOSE project, which is jointly funded by the enabling technology program Adaptive Multi Sensor Networks (AMSN) and the MIST research program of the Dutch Ministry of Defense. The INS submission was partly supported by the MIME project of the creative industries knowledge and innovation network CLICKNL.peer-reviewe

OAR@UM

Pursuing a moving target: iterative use of benchmarking of a task to understand the task

Author: Aly Robin
Eskevich Maria
Huet Benoit
Jones Gareth J.F.
Ordelman Roeland
Publication venue: CEUR-WS
Publication date: 01/01/2016
Field of study

Individual tasks carried out within benchmarking initiatives, or campaigns, enable direct comparison of alternative approaches to tackling shared research challenges and ideally promote new research ideas and foster communities of researchers interested in common or related scientific topics. When a task has a clear predefined use case, it might straightforwardly adopt a well established framework and methodology. For example, an ad hoc information retrieval task adopting the standard Cranfield paradigm. On the other hand, in cases of new and emerging tasks which pose more complex challenges in terms of use scenarios or dataset design, the development of a new task is far from a straightforward process. This letter summarises our reflections on our experiences as task organisers of the Search and Hyperlinking task from its origins as a Brave New Task at the MediaEval benchmarking campaign (2011–2014) to its current instantiation as a task at the NIST TRECVid benchmark (since 2015). We highlight the challenges encountered in the development of the task over a number of annual iterations, the solutions found so far, and our process for maintaining a vision for the ongoing advancement of the task’s ambition

Irish Universities

DCU Online Research Access Service

Radboud Repository

University of Twente Research Information

TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search

Author: Awad George
Blasi Saverio
Butt Asad,
Curtis Keith
Delgado Andrew
Fiscus Jonathan
Godil Afzad
Graham Yvette
Joy David
Kraaij Wessel
Lee Yooyoung
Magalhaes Joao
Quénot Georges
Semedo David
Smeaton Alan,
Publication venue: HAL CCSD
Publication date: 13/11/2018
Field of study

International audienc

Hal - Université Grenoble Alpes

A task category space for user-centric comparative multimedia search evaluations

Author: Bailer Werner
Barthel Kai Uwe
Gurrin Cathal
Heller Silvan
Jónsson Björn Þór
Lokoc Jakub
Peska Ladislav
Rossetto Luca
Schoeffmann Klaus
Vadicamo Lucia
Vrochidis Stefanos
Wu Jiaxin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/03/2022
Field of study

In the last decade, user-centric video search competitions have facilitated the evolution of interactive video search systems. So far, these competitions focused on a small number of search task categories, with few attempts to change task category configurations. Based on our extensive experience with interactive video search contests, we have analyzed the spectrum of possible task categories and propose a list of individual axes that define a large space of possible task categories. Using this concept of category space, new user-centric video search competitions can be designed to benchmark video search systems from different perspectives. We further analyse the three task categories considered so far at the Video Browser Showdown and discuss possible (but sometimes challenging) shifts within the task category spac

DCU Online Research Access Service