Search CORE

1,337 research outputs found

The AXES submissions at TrecVid 2013

Author: Aly Robin
Arandjelovic Relja
Chatfield Ken
Douze Matthijs
Fernando Basura
Harchaoui Zaid
McGuinness Kevin
O'Connor Noel E.
Oneata Dan
Parkhi Omkar M.
Potapov Danila
Revaud Jérôme
Schmid Cordelia
Schwenninger Jochen
Scott David
Tuytelaars Tinne
Verbeek Jakob
Wang Heng
Zisserman Andrew
Publication venue
Publication date: 01/11/2013
Field of study

The AXES project participated in the interactive instance search task (INS), the semantic indexing task (SIN) the multimedia event recounting task (MER), and the multimedia event detection task (MED) for TRECVid 2013. Our interactive INS focused this year on using classifiers trained at query time with positive examples collected from external search engines. Participants in our INS experiments were carried out by students and researchers at Dublin City University. Our best INS runs performed on par with the top ranked INS runs in terms of P@10 and P@30, and around the median in terms of mAP. For SIN, MED and MER, we use systems based on state- of-the-art local low-level descriptors for motion, image, and sound, as well as high-level features to capture speech and text and the visual and audio stream respectively. The low-level descriptors were aggregated by means of Fisher vectors into high- dimensional video-level signatures, the high-level features are aggregated into bag-of-word histograms. Using these features we train linear classifiers, and use early and late-fusion to combine the different features. Our MED system achieved the best score of all submitted runs in the main track, as well as in the ad-hoc track. This paper describes in detail our INS, MER, and MED systems and the results and findings of our experimen

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Irish Universities

DCU Online Research Access Service

Visual instance search from one example

Author: Tao R.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Visual instance search from one example

Author: Tao R.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Audio-visual football video analysis, from structure detection to attention analysis

Author: Ren Reede
Publication venue
Publication date: 01/01/2008
Field of study

Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic ﬁelds. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop speciﬁc techniques for content-based sports video analysis to utilise these characteristics. For an efﬁcient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identiﬁcation. Replay segments convey the most important contents in sports videos. It is an efﬁcient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a ﬁve-layer adaboost classiﬁer and a logo template matching throughout an entire video. The ﬁve-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to ﬁlter out logo transition candidates. Subsequently, a logo template is constructed and employed to ﬁnd all transition logo sequences. The precision and recall of this system in replay detection is 100% in a ﬁve-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identiﬁed by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a sufﬁx tree is proposed to ﬁnd the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reﬂection bias among modality salient signals and combines these signals by reﬂectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are ﬁlled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can ﬁnd goal events at a high precision. Moreover, results of MAR-based highlight detection on the ﬁnal game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

Glasgow Theses Service

CiteSeerX

OpenGrey Repository

Boundary Proposal Network for Two-Stage Natural Language Video Localization

Author: Chen Long
Ji Wei
Shao Jian
Xiao Jun
Xiao Shaoning
Ye Lu
Zhang Songyang
Publication venue
Publication date: 18/05/2021
Field of study

We aim to address the problem of Natural Language Video Localization (NLVL)-localizing the video segment corresponding to a natural language description in a long and untrimmed video. State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e.g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment. However, both kinds of one-stage approaches have inherent drawbacks: the anchor-based approach is susceptible to the heuristic rules, further limiting the capability of handling videos with variant length. While the anchor-free approach fails to exploit the segment-level interaction thus achieving inferior results. In this paper, we propose a novel Boundary Proposal Network (BPNet), a universal two-stage framework that gets rid of the issues mentioned above. Specifically, in the first stage, BPNet utilizes an anchor-free model to generate a group of high-quality candidate video segments with their boundaries. In the second stage, a visual-language fusion layer is proposed to jointly model the multi-modal interaction between the candidate and the language query, followed by a matching score rating layer that outputs the alignment score for each candidate. We evaluate our BPNet on three challenging NLVL benchmarks (i.e., Charades-STA, TACoS and ActivityNet-Captions). Extensive experiments and ablative studies on these datasets demonstrate that the BPNet outperforms the state-of-the-art methods.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Vehicle logo recognition using histograms of oriented gradient descriptor and sparsity score

Author: Hoang Vinh Truong
Meethongjan Kittikhun
Surinwarangkoon Thongchai
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/12/2020
Field of study

Most of vehicle have the similar structures and designs. It is extremely complicated and difficult to identify and classify vehicle brands based on their structure and shape. As we requirea quick and reliable response, so vehicle logos are an alternative method of determining the type of a vehicle. In this paper, we propose a method for vehicle logo recognition based on feature selection method in a hybrid way. Vehicle logo images are first characterized by histograms of oriented gradient descriptors and the final features vector are then applied feature selection method to reduce the irrelevant information. Moreover, we release a new benchmark dataset for vehicle logo recognition and retrieval task namely, VLR-40. The experimental results are evaluated on this database which show the efficiency of the proposed approach

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System